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Summary 


In  many  scenarios,  we  want  our  model  of  the  world  to  be  able  to  grow  in  complexity  as  we  collect 
more  information  from  the  world  around  us.  This  growth  reflects  that  we  learn  more  about  the 
world  as  we  acquire  more  data.  And  we  wish  to  explicitly  model  both  rare  events  as  well  as  the  po¬ 
tential  for  new  events  or  latent  outcomes  that  we  have  not  yet  experienced  or  collected  data  on.  In 
this  project,  we  have  developed  new  model  representations  that  enable  fast  and  efficient  inference, 
as  well  as  provided  and  proved  error  bounds  for  certain  classes  of  approximation.  Our  experiments 
below  were  on  simulated  data.  We  started  preliminary  work  on  the  Innovian  Time-Series  Anesthe¬ 
sia  Dataset  but  were  given  access  to  that  data  a  few  weeks  before  the  project  concluded  and  were 
not  able  to  run  our  full  experiments  in  that  time.  We  did  not  create  any  new  data  sets  as  part  of  this 
project. 
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Introduction 


In  many  data  sets,  we  can  view  the  data  points  as  exhibiting  a  collection  of  underlying  traits.  For 
instance,  each  document  in  the  New  York  Times  might  touch  on  a  number  of  topics  or  themes,  an 
individual’s  genetic  data  might  be  a  product  of  the  populations  to  which  their  ancestors  belonged, 
or  a  user’s  activity  on  a  social  network  might  be  dictated  by  their  varied  personal  interests.  When 
the  traits  are  not  directly  observed,  a  common  approach  is  to  model  each  trait  as  having  some 
frequency  or  rate  in  the  broader  population  [Airoldi  et  al.,  2014].  The  inferential  goal  is  to  learn 
these  rates  as  well  as  whether — and  to  what  extent — each  data  point  exhibits  each  trait.  Since  the 
traits  are  unknown  a  priori,  their  cardinality  is  also  typically  unknown. 

As  a  data  set  grows  larger,  we  can  reasonably  expect  the  number  of  traits  to  increase  as  well.  In 
the  cases  above,  for  example,  we  expect  to  uncover  more  topics  as  we  read  more  documents,  more 
ancestral  populations  as  we  examine  more  individuals’  genetic  data,  and  more  unique  interests  as 
we  observe  more  individuals  on  a  social  network.  Bayesian  nonparametric  (BNP)  priors  provide  a 
flexible,  principled  approach  to  creating  models  in  which  the  number  of  exhibited  traits  is  random, 
can  grow  without  bound,  and  may  be  learned  as  part  of  the  inferential  procedure.  By  generating  a 
countable  infinity  of  potential  traits — where  any  individual  data  point  exhibits  only  finitely  many — 
these  models  enable  growth  in  the  number  of  observed  traits  with  the  size  of  the  data  set. 

In  practice,  however,  it  is  impossible  to  store  a  countable  infinity  of  random  variables  in  mem¬ 
ory  or  learn  the  distribution  over  a  countable  infinity  of  variables  in  finite  time.  Conjugate  priors 
and  likelihoods  have  been  developed  [Orbanz,  2010]  that  theoretically  circumvent  the  infinite  rep¬ 
resentation  altogether  and  perform  exact  Bayesian  posterior  inference  [Broderick  et  al.,  2017]. 
However,  these  priors  and  likelihoods  are  often  just  a  single  piece  within  a  more  complex  gen¬ 
erative  model,  and  ultimately  an  approximate  posterior  inference  scheme  such  as  Markov  Chain 
Monte  Carlo  (MCMC)  or  variational  Bayes  (VB)  is  required.  These  approximation  schemes  often 
necessitate  a  full  and  explicit  representation  of  the  latent  variables. 

One  option  is  to  approximate  the  infinite-dimensional  prior  with  a  related  finite-dimensional 
prior:  that  is,  to  replace  the  infinite  collection  of  random  traits  by  a  finite  subset  of  “likely”  traits. 
To  do  so,  first  enumerate  the  countable  infinity  of  traits  in  the  full  model  and  write  (-0^,  0k)  for 
each  paired  trait  ipk  (e.g.  a  topic  in  a  document)  and  its  rate  or  frequency  9k.  Then  the  discrete 
measure  0  :=  YlkLi  captures  the  traits/rates  in  a  sequence  indexed  by  k.  The  (yy.  9k)  pairs 
are  random  in  the  Bayesian  model,  so  0  is  a  random  measure.  In  many  cases,  the  distribution  of  0 
can  be  defined  by  specifying  a  sequence  of  simple,  familiar  distributions  for  the  finite-dimensional 
yy  and  9k,  known  as  a  sequential  representation.  Given  a  sequential  representation  of  0,  a  natural 
way  to  choose  a  subset  of  traits  is  to  keep  the  first  K  <  oo  traits  and  discard  the  rest,  resulting 
in  an  approximate  measure  0  k .  This  approach  is  called  truncation.  Note  that  it  is  also  possi¬ 
ble  to  truncate  by  removing  atoms  with  weights  less  than  a  specified  threshold  [Argiento  et  al., 
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2016,  Muliere  and  Tardella,  1998],  though  this  approach  is  not  as  easily  incorporated  in  posterior 
inference  algorithms. 

Sequential  representations  have  been  shown  to  exist  for  completely  random  measures 
(CRMs)  [Kingman,  1967,  Ferguson  and  Klass,  1972],  a  large  class  of  nonparametric  priors  that 
includes  such  popular  models  as  the  beta  process  [Hjort,  1990,  Kim,  1999]  and  the  gamma  process 
[Ferguson  and  Klass,  1972,  Kingman,  1975,  Brix,  1999,  Titsias,  2008,  James,  2013].  Numerous 
sequential  representations  of  CRMs  have  been  developed  in  the  literature  [Ferguson  and  Klass, 
1972,  Bondesson,  1982,  Rosinski,  1990,  2001,  James,  2014,  Broderick  et  al.,  2017].  CRM  priors 
are  often  paired  with  likelihood  processes — such  as  the  Bernoulli  process  [Thibaux  and  Jordan, 
2007],  negative  binomial  process  [Zhou  et  al.,  2012,  Broderick  et  al.,  2015],  and  Poisson  likeli¬ 
hood  process  [Titsias,  2008].  The  likelihood  process  determines  how  much  each  trait  is  expressed 
by  each  data  point.  Sequential  representations  also  exist  for  normalized  completely  random  mea¬ 
sures  (NCRMs)  (sometimes  referred  to  as  normalized  random  measures  with  independent  incre¬ 
ments)  [Perman  et  al.,  1992,  Perman,  1993,  James,  2002,  Pitman,  2003,  Regazzini  et  al.,  2003, 
Lijoi  and  Priinster,  2010,  James  et  al.,  2009],  which  provide  random  distributions  over  traits,  such 
as  the  Dirichlet  process  [Ferguson,  1973,  Sethuraman,  1994].  NCRMs  are  typically  paired  with  a 
likelihood  that  assigns  each  data  point  to  a  single  trait  using  the  NCRM  as  a  discrete  distribution. 

Since  (N)CRMs  have  many  possible  sequential  representations,  a  method  is  required  for  deter¬ 
mining  which  to  use  for  the  application  at  hand  and,  once  a  representation  is  selected,  for  choosing 
a  truncation  level.  Our  main  contributions  enable  the  principled  selection  of  both  representation 
and  truncation  level  using  approximation  error: 

1 .  We  provide  a  comprehensive  characterization  of  the  different  types  of  sequential  representa¬ 
tions  for  (N)CRMs,  filling  in  many  gaps  in  the  literature  of  sequential  representations  along 
the  way.  We  classify  these  representations  into  two  major  groups:  series  representations, 
which  are  constructed  by  transforming  a  homogeneous  Poisson  point  process;  and  super¬ 
position  representations,  which  are  the  superposition  of  infinitely  many  Poisson  point  pro¬ 
cesses  with  finite  rate  measures.  We  also  introduce  two  novel  sequential  representations  for 
(N)CRMs. 

2.  We  provide  theoretical  guarantees  on  the  approximation  error  induced  when  truncating  these 
sequential  representations.  We  give  the  error  as  a  function  of  the  prior  process,  the  likeli¬ 
hood  process,  and  the  level  of  truncation.  While  truncation  error  bounds  for  (N)CRMs  have 
been  studied  previously,  past  work  has  focused  on  specific  combinations  of  (N)CRM  priors 
and  likelihoods — in  particular,  the  Dirichlet-multinomial  [Sethuraman,  1994,  Ishwaran  and 
James,  2001,  Ishwaran  and  Zarepour,  2002,  Blei  and  Jordan,  2006],  beta-Bernoulli  [Paisley 
et  al.,  2012,  Doshi- Velez  et  al.,  2009],  generalized  beta-Bemoulli  [Roy,  2014],  and  gamma- 
Poisson  [Roychowdhury  and  Kulis,  2015]  processes.  In  the  current  work,  we  give  much 
more  general  results  for  bounding  the  truncation  error. 

Our  results  fill  i  n  1  arge  g  aps  i  n  t  he  a  naly sis  oft  runcation  e  rror,  w  hich  iso  ften  m  easured  in 
terms  of  the  L 1  (a.k.a.  total  variation)  distance  between  the  data  distributions  induced  by  the  full 
and  truncated  priors.  We  provide  the  first  analysis  o  f  truncation  error  for  s  ome  s  equential  rep¬ 
resentations  of  the  beta  process  with  Bernoulli  likelihood  [Thibaux  and  Jordan,  2007],  for  the 
beta  process  with  negative  binomial  likelihood  [Zhou  et  al.,  2012,  Broderick  et  al.,  2015],  and  for 
the  normalization  of  the  generalized  gamma  process  [Brix,  1999],  the  a-stable  process,  and  the 
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generalized  inverse  gamma  [Lijoi  et  al.,  2005,  Lijoi  and  Priinster,  2010]  with  discrete  likelihood. 
Moreover,  even  when  truncation  results  already  exist  in  the  literature  [Ishwaran  and  James,  2001, 
Doshi- Velez  et  al.,  2009,  Paisley  et  al.,  2012,  Roychowdhury  and  Kulis,  2015],  we  improve  on 
those  error  bounds  by  a  factor  of  two.  The  reduction  arises  from  our  use  of  the  point  process  ma¬ 
chinery  of  CRMs,  circumventing  the  total  variation  bound  used  originally  by  Ishwaran  and  James 
[2001,  2002]  upon  which  most  modern  truncation  analyses  are  built.  We  obtain  our  truncation 
error  guarantees  by  bounding  the  probability  that  data  drawn  from  the  full  model  will  use  a  feature 
that  is  not  available  to  the  truncated  model.  Thinking  in  terms  of  this  probability  provides  a  more 
intuitive  interpretation  of  our  bounds  that  can  be  communicated  to  practitioners  and  used  to  guide 
them  in  their  choice  of  truncation  level. 

The  remainder  of  this  paper  is  organized  as  follows.  In  Section  4.1,  we  provide  background 
material  on  CRMs  and  establish  notation.  In  our  first  main  theoretical  section,  Section  4.2,  we 
describe  seven  different  sequential  CRM  representations,  including  four  series  representations  and 
three  superposition  representations,  two  of  which  are  novel.  Next,  we  provide  a  general  theoretical 
analysis  of  the  truncation  error  for  series  and  superposition  representations  in  Section  4.3.  We 
provide  analogous  theory  for  the  normalized  versions  of  each  representation  in  Section  4.4  via  an 
infinite  extension  of  the  “Gumbel-max  trick”  [Gumbel,  1954,  Maddison  et  al.,  2014],  We  deter¬ 
mine  the  complexity  of  simulating  each  representation  in  Section  4.5.  In  Chapter  5,  we  summarize 
our  results  (Table  1)  and  provide  advice  on  how  to  select  sequential  representations  in  practice. 
Proofs  for  all  results  developed  in  this  paper  are  provided  in  the  appendices. 
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Methods,  Assumptions,  and  Procedures 


3.1  Background 

3.1.1  CRMs  and  truncation 

Consider  a  Poisson  point  process  on  M+  :=  [0,  oo)  with  rate  measure  z/(d 9)  such  that 

z/(M+)  =  oo  and  J  min(l,  9)u(d9)  <  oo.  (1) 

Such  a  process  generates  a  countable  infinity  of  values  (9k)™=v  9k  G  M+,  having  an  almost  surely 
finite  sum  9k  <  oo.  In  a  BNP  trait  model,  we  interpret  each  6k  as  the  rate  or  frequency 

of  the  k- th  trait.  Typically,  each  0k  is  paired  with  a  parameter  v>k  associated  with  the  A  -th  trait 
(e.g.,  a  topic  in  a  document  or  a  shared  interest  on  a  social  network).  We  assume  throughout  that 
-0 A;  €  T  for  some  space  T  and  V'k  ~  G  for  some  distribution  G.  Constructing  a  measure  by  placing 
mass  9k  at  atom  location  v.!k  results  in  a  completely  random  measure  (CRM)  [Kingman,  1967].  As 
shorthand,  we  will  write  CRM(V)  for  the  completely  random  measure  generated  as  just  described: 

©  :=^0*^~CRMW- 

k 

The  trait  distribution  G  is  left  implicit  in  the  notation  as  it  has  no  effect  on  our  results.  Further, 
the  possible  fixed-location  and  deterministic  components  of  a  CRM  [Kingman,  1967]  are  not  con¬ 
sidered  here  for  brevity;  these  components  can  be  added  (assuming  they  are  purely  atomic)  and 
the  analysis  modified  without  undue  effort.  The  CRM  prior  on  0  is  typically  combined  with  a 
likelihood  that  generates  trait  counts  for  each  data  point.  Let  h{ ■  0)  be  a  proper  probability  mass 
function  on  N  U  {0}  for  all  9  in  the  support  of  v  (though  the  present  work  may  be  easily  extended 
to  likelihoods  with  support  in  M).  Then  a  collection  of  conditionally  independent  observations 
X1:N  :=  {Xn}^=1  given  0  are  distributed  according  to  the  likelihood  process  LP(/i,  0),  i.e. 

Xn:=J2xnk^k~  LP (h,Q), 

k 

if  xnh  h(x  |  9k)  independently  across  k  and  i.i.d.  across  n.  The  desideratum  that  each  Xn 
expresses  a  finite  number  of  traits  is  encoded  by  the  assumption  that 

/(l  —  h( 0  |  9))u(d9)  <  oo.  (2) 
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Since  the  trait  counts  are  typically  latent  in  a  full  generative  model  specification,  define  the  ob¬ 
served  data  Yn  \  Xn  m~p  /(•  |  Xn )  for  a  conditional  density  /  with  respect  to  a  measure  //  on  some 
space.  For  instance,  if  the  sequence  (6k)^=1  represents  the  topic  rates  in  a  document  corpus,  Xn 
might  capture  how  many  words  in  document  n  are  generated  from  each  topic  and  Yn  might  be  the 
observed  collection  of  words  for  that  document. 

Since  the  sequence  (dk)^=1  is  countably  infinite,  it  may  be  difficult  to  simulate  or  perform 
posterior  inference  in  this  model.  One  approximation  scheme  is  to  define  the  truncation  Ok  :  = 
Since  it  is  finite,  the  truncation  0  k  can  be  used  for  exact  simulation  or  in  posterior 
inference — but  some  error  arises  from  not  using  the  full  CRM  0.  To  quantify  this  error,  consider 
its  propagation  through  the  above  Bayesian  model.  Define  Z | .  y  and  W\,k  for  0 k  analogous  to  the 
definitions  of  A"1:jv  and  Y1:N  for  0: 

Zn  |  0A-  ~  LP (h,  0K),  Wn  |  Zn  ~p  f(-\Zn),  n  —  1, . . . ,  N. 

A  standard  approach  to  measuring  the  distance  between  0  and  Ok  is  to  use  the  L 1  metric  between 
the  marginal  densities  pn,oo  and  Pn,k  (with  respect  to  some  measure  p)  of  the  final  observations 
Y1:N  and  W].N  [Ishwaran  and  James,  2001,  Doshi- Velez  et  al.,  2009,  Paisley  et  al.,  2012]: 

1  1  f 

~\\PN,  oc  —  PN,k\\i  :=  9  /  \pN,oc(yi-.N)  ~  PN,k(V1:n)\  p(dl/i:N). 

All  of  our  bounds  on  ^\\pn,oo  ~  Pn,k\ |i  are  also  bounds  on  the  probability  that  contains  a 
feature  that  is  not  in  the  truncation  Ok  (cf.  Sections  4.3  and  4.4).  This  interpretation  may  be  easier 
to  digest  since  it  does  not  depend  on  the  observation  model  /  and  is  instead  framed  in  terms  of  the 
underlying  traits  the  practitioner  is  trying  to  estimate. 

3.1.2  The  gamma-Poisson  process 

To  illustrate  the  practical  application  of  the  theoretical  developments  in  this  work,  we  provide  a 
number  of  examples  throughout  involving  the  gamma  process  [Brix,  1999],  denoted  PPfy,  A,  d), 
with  discount  parameter  d  6  [0, 1),  scale  parameter  A  >  0,  mass  parameter  7  >  0,  and  rate  measure 

0)  = 

Setting  d  —  0  yields  the  undiscounted  gamma  process  [Ferguson  and  Klass,  1972,  Kingman,  1975, 
Titsias,  2008].  The  gamma  process  is  often  paired  with  a  Poisson  likelihood, 

h(x\e)  =  eX-o. 

xl 

Throughout  the  present  work,  we  use  the  rate  parametrization  of  the  gamma  distribution  (to  match 
the  gamma  process  parametrization),  for  which  the  density  is  given  by 

Gam(a;;a,6)  =  xa~1e~bx. 

T(a) 

Section  4.6  provides  additional  example  applications  of  our  theoretical  results  for  two  other  CRMs: 
the  beta  process  BP(y,  a ,  d)  [Teh  and  Goriir,  2009,  Broderick  et  al.,  2012]  with  Bernoulli  or  neg¬ 
ative  binomial  likelihood,  and  the  beta  prime  process  BPP(y,  a ,  d)  [Broderick  et  al.,  2017]  with 
odds-Bemoulli  likelihood. 
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3.2  Sequential  representations 


Sequential  representations  are  at  the  heart  of  the  study  of  truncated  CRMs.  They  provide  an  itera¬ 
tive  method  that  can  be  terminated  at  any  point  to  yield  a  finite  approximation  to  the  infinite  pro¬ 
cess,  where  the  choice  of  termination  point  determines  the  accuracy  of  the  approximation.  Thus, 
the  natural  first  s  tep  i  n  p  roviding  a  c  oherent  t  reatment  oft  runcation  a  naly  sis  istodothe  same 

for  sequential  representations.  In  past  work,  two  major  classes  of  sequential  representation  have 
been  used:  series  representations  of  the  form  xpjf=\  &k^k,  and  superposition  representations  of 

the  form  E£i  where  each  inner  sum  of  Ck  atoms  is  itself  a  CRM.  This  section  ex¬ 

amines  four  series  representations  [Ferguson  and  Klass,  1972,  Bondesson,  1982,  Rosinski,  1990, 
2001]  and  three  superposition  representations  (two  of  which  are  novel)  [Broderick  et  al.,  2012, 
2017,  James,  2014].  We  show  how  previously-developed  sequential  representations  for  specific 
CRMs  fit  into  these  seven  general  representations.  Finally,  we  discuss  a  stochastic  mapping  proce¬ 
dure  that  is  useful  in  obtaining  new  representations  from  the  transformation  of  others.  Proofs  for 
the  results  in  this  section  may  be  found  in  Appendix  A.2. 

3.2.1  Series  representations 

Series  representations  arise  from  the  transformation  of  a  homogeneous  Poisson  point  process 

[Rosinski,  2001].  They  tend  to  be  somewhat  difficult  to  analyze  due  to  the  dependence  between 
the  atoms  but  also  tend  to  produce  very  simple  representations  with  small  truncation  error  (cf.  Sec- 

sr^-k  ii.d. 

tion  4.3  and  Chapter  5).  Throughout  the  paper  we  let  Tk  =  ^  Et,  Ef  ~  Exp(l),  be  the  ordered 

i=  i 

jumps  of  a  unit-rate  homogeneous  Poisson  process  on  M+,  let  v  be  a  measure  on  M+  satisfying  the 
basic  conditions  in  Eq.  (1),  and  let?/)  k  ~  G. 

^lver^-JLevy^Fe^guson  and  Klass,  1972]  Define  u ■  (u)  :  =  inf  {.r  :  v  ([x,  oo))  <  u  } ,  the  in¬ 
verse  of  the1  rail  measuer  u(\x.  oo)).  We  say  0  has  an  inverse-Levy  representation  and  write 

OO 

0  =  with  ok  =  i/-(rfc). 

k= 1 

Ferguson  and  Klass  [1972]  showed  that  0  A-  IL-Rep(z/)  implies  0  ~  CRM(z/).  The  inverse- 
Levy  representation  is  analogous  to  the  inverse  CDF  method  for  generating  an  arbitrary  random 
variable  from  a  uniform  random  variable,  with  the  homogenous  Poisson  process  playing  the  role 
of  the  uniform  random  variable.  It  is  also  the  optimal  sequential  representation  in  the  sense  that  the 
sequence  0k  that  it  generates  is  non-increasing.  While  an  elegant  and  general  approach,  simulat¬ 
ing  the  inverse-Levy  representation  is  difficult,  as  inverting  the  function  v  ([x,  oo))  is  analytically 
intractable  except  in  a  few  cases. 

Example  4.2.1  (Gamma  process,  TP(7,  A,  0)).  We  have  z/([x,  oo))  =  7 A £4  (Ax),  where  E\(x)  := 
u~1e~u  du  is  the  exponential  integral  function  [Abramowitz  and  Stegun,  1964].  The  inverse- 
Levy  representation  for  TP (7,  A,  0)  is  thus 

OO 

e  =  E  v1Br1(7'1A“1r*)«*. 

k= 1 
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Neither  Ex  nor  its  inverse  can  be  computed  in  closed  form,  so  one  must  resort  to  numerical  ap¬ 
proximations. 

Bondesson  [Bondesson,  1982]  We  say  0  has  a  Bondesson  representation  and  write  0  <— 
B-Rep(c,  g )  if  for  c  >  0  and  g  a  density  on  M+, 

OO 

0  =  YM*.,  with  =  Vke~Tk/c,  Vk  ~  g. 

k= 1 

Theorem  4.2.1  shows  that  Bondesson  representations  can  be  constructed  for  a  large,  albeit  re¬ 
stricted,  class  of  CRM  rate  measures.  We  offer  a  novel  proof  of  Theorem  4.2.1  in  Appendix  A. 2 
using  the  induction  strategy  introduced  by  Banjevic  et  al.  [2002].  Similar  proof  ideas  are  also  used 
to  prove  truncation  error  bounds  for  sequential  representations  in  Section  4.3.  We  use  a  slight  abuse 
of  notation  for  brevity:  if  u(d9)  is  a  measure  on  M+  that  is  absolutely  continuous  with  respect  to 
Lebesgue  measure,  then  v{9)  is  the  density  of  u{d9)  with  respect  to  the  Lebesgue  measure. 

Theorem  4.2.1  (Bondesson  representation  [Bondesson,  1982]).  Let  u(d9)  =  u(9)d9  be  a 
rate  measure  satisfying  Eq.  (1).  If  9 v [9)  is  nonincreasing,  lim^^  9v{9)  =  0,  and  cv  :  = 
lim^o  9u{9)  <  oo,  then  gu(v)  :=  —  c"1  jp[w(v)]  is  a  density  on  M+  and 

0  A-  B-RepfT,,,  gu)  implies  0  ~  CRM(V). 

Example  4.2.2  (Bondesson  representation  for  TP(7,  A,  0)).  The  following  representation  for  the 
gamma  process  with  d  =  0  was  described  by  Bondesson  [1982]  and  Banjevic  et  al.  [2002].  Since 
9v(9)  =  y\e~xe  is  non-increasing  and  cv  =  lim^o  9v{9)  =  7A,  we  obtain  gv{v)  =  \e~Xv  = 
Exp(u;  A).  Thus,  it  follows  from  Theorem  4.2.1  that  if  0  A-  B-Rep(7A,  Exp(A)),  then  0  ~ 
TP (7,  A,  0).  The  condition  that  9u{9)  is  non-increasing  fails  to  hold  if  d  >  0,  so  we  cannot  apply 
Theorem  4.2.1  to  TP(7,  A,  d)  when  d  >  0. 

Thinning  [Rosihski,  1990]  Using  the  nomenclature  of  Rosinski  [2001],  we  say  0  has  a  thinning 
representation  and  write  0  T-Repfz/,  g)  if  g  is  a  probability  measure  on  M+  such  that  v  is 
absolutely  continuous  with  respect  to  g,  i.e.  v  g,  and 

0  =  9k5^  With  9k  =  Vkl  (ig^  -  Tk )  ’  Vk  ~ g- 

Rosihski  [1990]  showed  that  0  •<—  T-Rep(z/,  g)  implies  0  ~  CRM(/y).  Note  that  Tk  ^4'  00  as 
k  — »  00,  so  the  probability  that  ^(1 4)  >  Tfc  is  decreasing  in  k.  Thus,  this  representation  generates 
atoms  with  9k  =  0  (which  have  no  effect  and  can  be  removed)  increasingly  frequently  and  becomes 
inefficient  as  k  — »  00. 

Example  4.2.3  (Thinning  representation  for  TP(7,  A,  d)).  If  we  let  g  =  Gam(l  —  d.  A),  then  the 
thinning  representation  for  TP (7,  A,  d)  is 

OO 

®  =  Vkl(VkTk  <  7)fyfc,  with  -  Gam(l  -  d,  A). 

k=\ 
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Rejection  [Rosiriski,  2001]  Using  the  nomenclature  of  Rosinski  [2001],  we  say  0  has  a  rejection 
representation  and  write  0  A-  R-Rep(V,  p)  if  p  is  a  measure  on  M+  satisfying  Eq.  (1)  and  ^  <  1, 
and 


©  =  y 

k= 1 


with  = 


Uk  ~  Unif [0, 1]. 


(Vfc)fceN  ~  PoissP  (p), 


Rosinski  [2001]  showed  that  0  A-  R-Rep(z7  p)  implies  0  ~  CRM(V).  This  representation  is 
very  similar  to  the  thinning  representation,  except  that  the  sequence  (14 )*gn  is  generated  from  a 
Poisson  process  on  M+  rather  than  i.i.d.  This  allows  14  — >  0  as  k  —>  oo,  causing  the  frequency 
of  generating  ineffective  atoms  9k  =  0  to  decay  as  k  — *  oo,  assuming  p  is  appropriately  chosen 
such  that  j^(9)  — *  1  as  6  — *  0.  This  representation  can  thus  be  constructed  to  be  more  efficient 
than  the  thinning  representation.  We  can  calculate  the  efficiency  in  terms  of  the  expected  number 
of  rejections  (that  is,  the  number  of  6k  that  are  identically  zero): 

Proposition  4.2.2.  For  R-Rep(/z,  p),  the  expected  number  of  rejections  is 


E 


14(9*  =  0) 

_k= 1 


=  J{1-  rfx)) AdA 


Remark.  If  p  and  v  can  be  written  as  densities  with  respect  to  Lebesgue  measure,  then  the  integral 
in  Proposition  4.2.2  can  be  rewritten  as  f  (p(x)  —  v{x))dx. 

Example  4.2.4  (Rejection  representation  for  rP(y,  A,  0)).  Following  Rosinski  [2001],  consider 
p(d9)  =  yA6l_1(l  +  A6))_1d6).  We  call  CRM (p)  the  Lomax  process,  LomP(7,  A-1),  after 
the  related  Lomax  distribution.  We  can  use  the  inverse-Levy  method  analytically  with  p  since 
p^(u)  =  X(j(7A)_1m  Thus,  the  rejection  representation  of  TP (7,  A,  0)  is 


0  =  £V*1(£4<  (l  +  AI4)e-Ayfe)^fe,  with  Ufc 

k= 1 


1 

A(e(7A)-1rfc  _  ’ 


Uk~  Unif[0,l]. 


Unlike  in  the  thinning  construction  given  in  Example  4.2.3,  only  a  finite  number  of  rates  will  be 
set  to  zero  almost  surely.  In  particular,  the  expected  number  of  rejections  is  7A c7,  where  c7  is  the 
Euler-Mascheroni  constant. 


Example  4.2.5  (Rejection  representation  for  TP(7,  A ,d),d  >  0).  For  the  case  of  d  >  0,  we  instead 
use  p(d0)  =  7r(l  d)6~l-dd0.  We  can  again  use  the  inverse-Levy  method  analytically  with  p  since 
p^(u)  =  (7'M_1)1/d,  where  7'  :=  7^7777-  The  rejection  representation  is  then 

OO 

0  =  Vkl {Uk  <  e-XVk%!.k,  with  Ufc  =  1)1/d,  Uk  U n i f [0 , 1], 

k= 1 

The  expected  number  of  rejections  is  7^-7-,  so  the  representation  is  efficient  for  large  d,  but  ex¬ 
tremely  inefficient  when  d  is  small. 
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3.2.2  Superposition  representations 

Superposition  representations  arise  as  an  infinite  sum  of  CRMs  with  finite  rate  me  asure.  These 
tend  to  be  easier  to  analyze  than  series  representations  as  they  decouple  atoms  between  the  summed 
CRMs,  but  can  produce  representations  with  larger  truncation  error  (cf.  Section  4.3  and  Chapter  5). 
Throughout,  let  v  be  a  measure  on  M+  satisfying  the  basic  conditions  in  Eq.  (1),  and  let?/;  k  ~  G. 
Decoupled  Bondesson  We  say  0  has  a  decoupled  Bondesson  representation  and  write  0  <— 
DB-Rep(c,  g,  £)  if  for  c  >  0,  £  >  0,  and  g  a  density  on  M+, 

Oh  =  Vkie~Tki,  (3) 
Vki  ~  g. 

This  is  a  novel  superposition  representation,  though  special  cases  are  already  known  [Paisley  et  al., 
2010,  Roychowdhury  and  Kulis,  2015].  Theorem  4.2.3  shows  that  the  decoupled  Bondesson  rep¬ 
resentation  applies  to  the  same  class  of  CRMs  as  the  Bondesson  representation  from  Section  4.2.1. 

Theorem  4.2.3  (Decoupled  Bondesson  representation).  Let  o(dd)  =  v(0)d6,  cv,  and  gu  be  as 
specified  in  Theorem  4.2.1.  Then  for  any  fixed  £  >  0, 

0  A-  DB-Rep(cy,  gu,  £)  implies  0  ~  CRM(/z). 

The  proof  of  Theorem  4.2.3  in  Appendix  A. 2  generalizes  the  arguments  from  Paisley  et  al. 
[2010]  and  Roychowdhury  and  Kulis  [2015].  The  free  parameter  £  controls  the  number  of  atoms 
generated  for  each  outer  sum  index  k;  its  principled  selection  can  be  made  by  trading  off  compu¬ 
tational  complexity  (cf.  Section  4.5)  and  truncation  error  (cf.  Section  4.3). 

Example  4.2.6  (Decoupled  Bondesson  representation  for  TP (7,  A,0)).  Arguments  parallel¬ 
ing  those  made  in  Example  4.2.2  show  that  the  TP  (7,  A,  0)  representation  from  Roychowd¬ 
hury  and  Kulis  [2015]  follows  directly  from  an  application  of  Theorem  4.2.3:  if  0  A- 
DB-Rep(7A,  Exp(A),  £),  then  0  ~  TP(7,  A,  0).  As  in  the  Bondesson  representation  setting,  The¬ 
orem  4.2.3  does  not  apply  to  TP(7,  A ,d)  when  d  >  0  because  the  condition  that  6u(6)  is  non¬ 
increasing  fails  to  hold. 

Size-biased  [Broderick  et  al.,  2017,  James,  2014]  Let  7r(6l)  :=  h{ 0  |  0).  We  say  0  has  a  size- 
biased  representation  and  write  0  <—  SB-Rep(z/,  h )  if 

00  Ck 

0  =  with  Ck  ~P  Poiss  (r]k)  , 

k= 1  i= 1 

0ki  '~  -7t{d)k~l  (1  -7r(0))i/(d0),  (4) 

Vk 

Vk  ■=  I  vr(6')fc'1  (1  -  it (9))  v{dQ). 

Broderick  et  al.  [2017]  and  James  [2014]  showed  that  0  SB-Rep(z/,  h)  implies  0  ~  CRM(z/). 
If  the  rate  measure  v  and  the  likelihood  h  are  selected  to  be  a  conjugate  exponential  family  then, 
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00  Ck 

0 =  @kifi^ki, 

k= 1  i= 1 


with  Ck  ~  Poiss(c/£), 

Tkl  *S?Gam(fc,0, 


noting  that  h(x  \  9)  —  1  —  "(0).  the  rate  9ki  can  be  sampled  from  a  mixture  of  exponential 
family  distributions: 

Oki  |  zki  ~P  — —h(zki  |  0)7r(0)fc-1z/(d0),  zki  ~P  Categorical  ((Vkx/Vk)™=1) , 

dkzki 

V kx  ■=  j  h{x\9)n{9)k~1v{d9). 


Example  4.2.7  (Size-biased  representation  for  TP (7,  A,  d)).  For  the  Gamma  process,  values  for 
rjkx  and  r)k  can  be  found  using  integration  by  parts  and  the  standard  gamma  distribution  integral, 
while  9ki  |  Zki  is  sampled  from  a  gamma  distribution  by  inspection: 

7A1~dr(x  -  d)  f  i^((\  +  k)d-(\  +  k-  l)d)  d>  0 

^ kx  x\T(l  -  d)(\  +  k)x~d’  ^ k  \  7A  (log(A  +  k)  —  log  (A  +  k  —  1))  d  =  0 

9ki  |  Zki  ~P  Gam(x  —  d,  A  +  k). 


Power-law  We  say  0  has  a  power-law  representation  and  write  0  A-  PL-Rep(7,  a,  d.  g )  if  for 
7>0,0<d<l,  a>  —d,  and  g  a  density  on  R+, 


00  Ck 

k= 1  i= 1 


Ck  ~  Poiss(7), 
~  9 , 


fe-i 

9ki  Ukik  J^J  (1  Ukij  )  j 

i=i 


Ukij  '~P  Beta(l  —  d,  a  +  jd). 


(5) 


This  is  a  novel  superposition  representation,  although  it  was  previously  developed  in  the  special 
case  of  the  beta  process  (where  g(v)  =  S  1 )  [Broderick  et  al.,  2012].  The  name  of  this  representation 
arises  from  the  fact  that  it  exhibits  Types  I  and  II  power-law  behavior  [Broderick  et  al.,  2012]  under 
mild  conditions  when  d  >  0,  as  we  show  in  Theorem  A. 2.1  in  the  appendix  (note,  however,  that 
it  will  not  exhibit  power-law  behavior  when  d  =  0).  Theorem  4.2.5  below  shows  the  conditions 
under  which  0  A-  PL-Rep(7,  cc,  d,  g)  implies  0  ~  CRM(/y).  Its  proof  in  Appendix  A. 2  relies  on 
the  notion  of  stochastic  mapping  (Lemma  4.2.4),  a  powerful  technique  for  transforming  one  CRM 
into  another.  Note  that  in  Lemma  4.2.4,  the  case  where  u  is  a  deterministic  function  of  9  via  the 
mapping  u  =  t(0)  may  be  recovered  by  setting  k(9,  dw)  =  ST^. 

Lemma  4.2.4  (CRM  stochastic  mapping).  Let  0  =  YlkLi  @k$ii>h  ~  CRM(z/).  Then  for  any  proba¬ 
bility  kernel  n(9,  dw),  we  have  k(Q)  ~  CRM(i/k),  where 


ft(0)  :=  y ^ukS^k,  uk  |  9k  ~  n(9k,  •),  and  vK{du)  :=  n(9,du)v(d9). 


k=  1 


Theorem  4.2.5  (Power-law  representation).  Let  v(d 9)  =  ij{6)d()  be  a  rate  measure  satisfying 
Eq.  (1),  and  let  gu  be  a  density  on  M+  such  that 


u(u)  =  9  1gv  ( u9  x)  uBP  (d0) , 
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where 


z/Bp(d6»)  =  7 


r(a  + 1) 

r(l  -  d)T(ct  +  d) 


1  [9  <  1]  9~l~d{ l-9)a+d-l&0 


is  the  rate  measure  for  the  beta  process  BP(7,  a,  d)  from  Eq.  (24).  Then 


0  A-  PL-Rep(7,  a,  d,  gv)  implies  0  ~  CRM(z/). 

Example  4.2.8  (Power-law  representation  for  TP (7,  A,  d)).  If  we  choose  gv  =  Gam(A,  A),  then 
using  the  change  of  variable  w  =  u(6~l  —  1), 


6  V  (u6  x)  uB P  (d 9) 
AA 


=  7A 
=  7A 
=  7 


r(i  -  d)r(A  +  d) 
aa 

r(l  -d)T(\  +  d) 

\  1— d 

A  ,-1-d-Xu 


U 


A— 1 


(1  -0) 


A+d-1 


d  6  d  u 


u-l-de-Xu 


wX+d-le-Xw  dw  du 


r(i-d) 


u  *  "e  ""d u. 


It  follows  immediately  from  Theorem  4.2.5  that  if  0  PL-Rep(7,  A,  d.  Gam(A,  A)),  then  0  ~ 
rP(7,  A,  d ).  To  the  best  knowledge  of  the  authors,  this  power-law  representation  for  the  gamma 
process  is  novel. 


3.3  Truncation  analysis 

Each  of  the  sequential  representations  developed  in  Section  4.2  shares  a  common  structural 
element — an  outer  infinite  s  um — which  i  s  r  esponsible  f  or  g  enerating  a  c  ountably  i  nfinite  num¬ 
ber  of  atoms  in  the  CRM.  In  this  section,  we  terminate  these  outer  sums  at  a  finite  truncation  level 

K  G  N,  resulting  in  a  truncated  CRM  0K  possessing  a  finite  number  of  atoms.  We  develop  upper 
bounds  on  the  error  induced  by  this  truncation  procedure.  All  of  the  truncated  CRM  error  bounds 
in  this  section  rely  on  Lemma  4.3.1,  which  is  a  tightening  (by  a  factor  of  two)  of  the  bound  in  Ish- 
waran  and  James  [2001,  2002]  (for  its  generalization  to  arbitrary  discrete  random  measures,  see 
Lemma  A. 3.1).  Proposition  4.3.2  shows  that  the  bound  in  Lemma  4.3.1  is  tight  without  further 
assumptions  on  the  data  likelihood  /. 

Lemma  4.3.1  (CRM  protobound).  Let  0  ~  CRM(V).  For  any  truncation  0  K,  if 

Xn  I  0  ~  LP (h,  0),  zn\  ®K  ^  LP (h,  eK), 

Yn  I  /(.  I  xn),  wn  I  zn  /(•  I  Zn), 

then,  with  Pn,oo  and  Pn,k  denoting  the  marginal  densities  o[YX:N  and  W\,n,  respectively, 

^||PiV,oo  ~PN,k\\i  <  1  -P(supp(Xi:jv)  C  SUpp(0K))  , 
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Proposition  4.3.2  (Protobound  tightness).  If  G  is  non-atomic  and  T  is  Borel,  then  for  any  5  >  0, 
there  exists  a  likelihood  f  such  that  Lemma  4.3.1  is  tight  up  to  a  factor  of  1  —  5. 


The  proof  of  all  results  in  this  section  (including  Lemma  4.3.1  and  Proposition  4.3.2)  can 
be  found  in  Appendix  A. 3.  All  of  the  provided  truncation  results  use  the  generative  model  in 
Lemma  4.3.1,  and  are  summarized  in  Table  1  in  Chapter  5.  Throughout  this  section,  for  a  given 
likelihood  model  h(x  \  6)  we  define  ir(6)  :=  h{ 0  |  9)  for  notational  brevity.  The  asymptotic  behav¬ 
ior  of  truncation  error  bounds  is  specified  with  tilde  notation: 


a(K)  ~  b{K),  K  -A  cx) 


lim  44) 

K— ¥OC  b(K  ) 


1. 


3.3.1  Series  representations 

Each  of  the  series  representations  can  be  viewed  a  functional  of  a  standard  Poisson  point  process 

and  a  sequence  of  i.i.d.  random  variables  with  some  distribution  g  on  M+.  In  particular,  we  may 
write  each  in  the  form 

OO 

©  =  9kd^k,  with  9k  =  t(\ 4,  Tfc),  Vk  ~  g ,  (6) 

fc=i 

where  Tk  are  the  jumps  of  a  unit-rate  homogeneous  Poisson  point  process  on  M+,  and  r  :  M+  x 
M+  — y  M+  is  a  non-negative  measurable  function  such  that  lim,,^^  t(v,  u)  =  0  for  ^-almost  every 
v.  The  truncated  CRM  then  takes  the  form 

K 

0 K  '■= 

fc=i 


Theorem  4.3.3  provides  a  general  truncation  error  bound  for  series  representations  of  the  form 
Eq.  (6),  specifies  its  range,  and  guarantees  that  the  bound  decays  to  0  as  K  — >  oo. 


Theorem  4.3.3  (Series  representation  truncation  error).  The  error  in  approximating  a  series  rep¬ 
resentation  of  (-)  with  its  truncation  0K  satisfies 


0  <  -||pat,oo  —  Pn, k ||i  <  1  —  e  Bn'k  <  1, 


where 


B 


N,K  — 


1  -  E 


7T  (t  (y,  u  +  Gk))N  )  da, 


(7) 


Go  :=  0,  Gk  Gam(/i,  l)/or  K  >  1,  and  V  g.  Furthermore,  VN  e  N,  lim^-s-oo  B^,k  =  0. 

Remark.  An  alternate  form  of  II m. k  that  is  sometimes  easier  to  use  in  practice  can  be  found  by 
applying  the  standard  geometric  series  formula  to  Eq.  (7),  which  yields 


indep 


N 


bn,k  =  Y,  e  [vt  (t(V,u  +  Gk))11-1  (1-7 r  (r  (V,  u  +  GK 


dw. 


n=  1 
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(8) 


A  simplified  upper  bound  on  BN  K  can  be  derived  by  noting  that  "(B)  <  1,  so 

poo 

Bn,k<N  (l-E[n(T(V,u  +  GK))])du. 

Jo 

This  bound  usually  gives  the  same  asymptotics  in  K  as  Eq.  (7). 

The  main  task  in  using  Theorem  4.3.3  to  develop  a  truncation  error  bound  for  a  series  represen¬ 
tation  is  evaluating  the  integrand  in  the  definition  of  BNK.  Thus,  we  next  evaluate  the  integrand 
and  provide  expressions  of  the  truncation  error  bound  for  the  four  series  representations  outlined 
in  Section  4.2.1.  Throughout  the  remainder  of  this  section,  Gk  is  defined  as  in  Theorem  4.3.3, 
F0  =  1,  and  FK  is  the  CDF  of  Gk- 

Inverse-Levy  representation  For  this  representation  we  have 

t(v,  u )  =  v*~{u)  '■=  inf  {y  :  v  ([y,  oo))  <  u}  . 

To  evaluate  the  bound  in  Eq.  (8),  we  use  the  transformation  of  variables  x  =  u^(u  +  Gk)  and  the 
fact  that  for  a,  b  >  0,  v^(a)  >  b  a  <  v  ([b,  oo))  to  conclude  that 


Bn,k  <  N  /  Fk(v[x,  oo))(1  —  vr(a;))  v(dx). 
Jo 


(9) 


Recent  work  on  the  inverse-Fevy  representation  has  developed  Monte  Carlo  estimates  of  the  error 
of  the  truncated  random  measure  moments  for  those  v  ([;/;,  oo))  with  known  inverse  iF  [Arbel  and 
Priinster,  2017].  In  contrast,  the  result  above  provides  an  explicit  bound  on  the  L 1  truncation  error. 
Our  bound  does  not  require  knowing  which  is  often  the  most  challenging  aspect  of  applying 
the  inverse-Fevy  representation. 

Example  4.3.1  (IL-Rep  truncation  for  LomP(7,  A-1)  with  Poisson  likelihood).  Recall  from  Ex¬ 
ample  4.2.4  that  the  Fornax  process  LomP(7,  A-1)  is  the  CRM  with  rate  measure  z/(d6*)  = 
7A6i_1(1  +  A6))_1d6l,  so  v[x,  oo)  =  7A  log{l  +  (Ax)-1}.  Using  Eq.  (9),  we  have 

poo 

Bn,k  <  N'yX  /  FA'(7Alog{l  +  (Ax)_1})(l  -  e~x)x~l{l  +  Ax)_1da;. 


Since  FK{t)  <th/K\  <  ( 3t/ K  )A,  for  any  a  >  0  the  integral  is  upper  bounded  by 


x  X(1  +  Xx)  Mo; 


(1  +  Xx)  dx  +  Fk  (  7A  log  <j  1  +  — 


<a  +  Fk  ^7 A  log  |l  +  iogj1  +  ^ 

<  A_1(e6  ■»  1)_1  +  b(3ryXb/K)I<  where  b  :=  log{l  +  (Aa)-1}.  (10) 

Replacing  ( eb  —  1)_1  with  the  approximation  e~b  and  then  setting  the  two  terms  in  Eq.  (10)  equal, 

(f  K+2  /  x  1 

<  37Ak+!  (K  +  1)  K+] 

i.e. 


,  where  Wo  is  the  product  logarithm  function, 

W0(y)  =  x  xex  =  y.  (11) 
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_  ,  ,  1  K+2  1 

Thus,  using  the  fact  that  e  <  (e —  1)  and  A k+i[K  +  1)k+i  reaches  its  maximum  at  K  — 
max(0,  eA^1  —  1),  we  conclude  that 

B  2iV7[l  +  (37A)-1] 

N  K  exp  (KWq  ({37Amax(A,e)}-1))  —  1 

~  2iY7[l  +  (37A)-1]e-A'W/o({37Amax(A,e)}-1)  R  _ >  ^ 


Bondesson  representation  For  this  representation  we  have 


t(v,  u ) 


ve 


■u/c 

1 


g(dv)  =  —  c  1—  (vu(v))  dr;. 


Writing  the  expectation  over  V  explicitly  as  an  integral  with  measure  g(v)dv,  using  the  trans¬ 
formation  of  variables  u  =  —  c\ogx/v  (so  x  =  ve~u^c),  and  given  the  definition  of  g{v)  = 
—c~ 1  A-  (vu(v))  for  the  Bondesson  representation,  we  have 


Bn,k  <  N 


(l  —  E  [n  (ve  Gx/C)])  v(dv). 


~  T) 

Example  4.3.2  (Truncation  of  the  Bondesson  representation  for  TP (7,  A,  0)).  Let  G  k  =  GkKi  A). 
Since  ir(6)  =  e~6  and  c  =  7A,  we  have 


(l  —  E  [n(ve  u(dv)  =  7AE 


(1  -e 


—ve  GK\  —  1  —  \v 


)v-ve~Xvdv 


=  7AE  log(l  +  e  Gk/X) 


<  7  E 


d~gk 


=  7 


7A 


1  +  7A 


K 


The  second  equality  follows  by  using  the  power  series  for  the  exponential  integral  [Abramowitz 
and  Stegun,  1964,  Chapter  5].  Thus, 


Bn,k  <  Nnf 


7A 


K 


l  +  'fX 

Thinning  representation  For  this  representation  we  have 


r(v,  u )  =  vt 


dv  (  . 
d9W-“ 


g  any  distribution  on  M+  s.t.  v  -C  g. 


Since  7r(0)  =  1  by  Lemma  A. 1.3,  we  have  that  1  —  7r  (ul(A))  =  (1  —  tt  (d))  1(A)  for  any  event 
A.  Using  this  fact,  we  have 

r°°  r  afO 

BN)K<N  (1  —  tt(v))  /  S  FK(u)dug(dv).  (12) 

Jo  Jo 

Analytic  bounds  for  the  thinning  representation  of  specific  processes  tend  to  be  opaque  and  nota- 
tionally  cumbersome,  so  we  simply  compare  its  truncation  error  in  Chapter  5  to  the  other  repre¬ 
sentations  by  numerical  approximation  of  Eq.  (12). 
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Rejection  representation  Assume  that  we  can  use  the  inverse-Levy  representation  to  simulate 
PoissP(/i).  Then  for  the  rejection  representation  we  have 


t(v,  u) 


H^{u)  1 


d  v 
d/i 


{^{u))>v 


g{ dv)  —  1  [0  <  v  <  l]dx, 


where  /i  satisfies  v  -C  ji,  ^  <  1,  and  g^(u)  :  =  inf  {x  :  /i([x,  oo))  <  «}.  Using  the  same 
techniques  as  for  the  thinning  and  inverse-Levy  representations,  we  have  that 


Bn,k  <N  Fk{h[x,  oo))(1  -  tt(x))  i/(dx). 


(13) 


Example  4.3.3  (R-Rep  truncation  for  TP(7,  A,  0)  with  Poisson  likelihood).  Using  Eq.  (13)  and 
the  fact  that  1  —  e~x  <  x,  we  have 


Bn,k  <  N'yX  /  Fft:(7Alog{l  +  (Ax)  })e  (lx. 

Jo 

Arguing  as  in  Example  4.3.1,  we  see  that  the  integral  in  Eq.  (14)  is  upper  bounded  by 


(14) 


e  dx  +  Fk  (  7A  log  <;  1  +  — 


1 


i~Xxdx 


<  a  +  A  Fk  I  ■ 7  A  log  1 1  +  — 

=  A'1  {{eb  -  I)”1  +  (37A b/K)K)  ,  (15) 

where  b  log{l  +  (Aa)-1}.  Replacing  (eh  —  1)_1  with  the  approximation  e~b  and  then  setting 
the  two  terms  in  Eq.  (15)  equal  to  each  other,  we  obtain  b  =  K  1U0({37A}"1)  (where  Wq  is  defined 
in  Eq.  (11))  and  conclude  that 

2N,y 


B 


N,K  < 


2N,ye 


-KWo^X}-1) 


K  — »  00. 


eA'Wo({37A}-1)  _  l 

Example  4.3.4  (R-Rep  truncation  for  TP (7,  A,  d)  with  Poisson  likelihood,  d  >  0).  We  have 


Biv.K  <  N: 


7A 


1 -d  r° 0 


r(i  -  d)  J„ 

The  integral  can  be  upper  bounded  as 


FK(ix-d)(  1  -  e~x)x~1~ae~Xxdx. 


I  x-ddx  +  FK(ya~d)  I  (1  +  e-^x-'-V^dx 

Jo  Ja 

<  (1  -  d)-V-d  +  T(-d){Xd  -  (1  +  A)rf)(37 'K-1a~d)K. 

Setting  the  two  terms  equal  and  solving  for  a,  we  obtain 

l^ld  ,  d(i-d)  f  37A1^d 


Bn,k  <  2  N 


2N 


r(2  -  d) 


7A 


[(1  -  d)r(-(i)]w 


dr(l  -  d)K 


Kd(l-d) 

d(l-d)+K 


1-d 


r(2  -  d) 


37A1 


1-d  1  d(l-d) 


dT(l  -  d) 


j^-d(l-d) 


K  — y  00. 
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3.3.2  Superposition  representations 

For  superposition  representations,  the  truncated  CRM  takes  the  form 

K  Ck 

k=  1  i=  1 

Let  (~)J{  :=  0  —  ©a'  denote  the  tail  measure.  By  the  superposition  property  of  Poisson  point 
processes  [Kingman,  1993],  the  tail  measure  is  itself  a  CRM  with  some  rate  measure  and  is 
independent  of  0K: 

oo  Ck 

©a  =  ^  -CRM  (4),  0+10K,  0  =  0^  +  ©+.  (16) 

k=K+ 1  i= 1 

The  following  result  provides  a  general  truncation  error  bound  for  superposition  representations, 
specifies  its  range,  and  guarantees  that  the  bound  decays  to  0  as  K  — >  oo. 

Theorem  4.3.4  (Superposition  representation  truncation  error).  The  error  in  approximating  a  su¬ 
perposition  representation  of  0  ~  CRM(z/)  with  its  truncation  0  a  satisfies 

0  <  -||ptv,oc  —  Pn,i<\\i  <  1  —  e~BN’K  <  1, 

where 

Bn,k  ■=  I  (i  -  tt(6)n)  v+{dd).  (17) 

Furthermore,  WN  G  N,  lini  a^oc  BNjK  =  0. 

Remark.  As  for  series  representations,  an  alternate  form  of  B^,k  that  is  sometimes  easier  to  use 
can  be  found  by  applying  the  standard  geometric  series  formula  to  Eq.  (17): 

N  r 

Bn.k  =  )T  /  ir(O)”-1  (1  -  ir(9))  <(cW). 

n=  1  ^ 

A  simplified  upper  bound  on  BN  K  can  be  derived  by  noting  that  n(6)  <  1,  so 

poo 

Bn,k<N  /  (l-7r(0))  u+(d 9). 

Jo 

This  bound  usually  gives  the  same  asymptotics  in  K  as  Eq.  (17). 

The  main  task  in  using  Theorem  4.3.4  to  develop  a  truncation  error  bound  for  a  superposition 
representation  is  determining  its  tail  measure  uf.  In  the  following,  we  provide  the  tail  measure  for 
the  three  superposition  representations  outlined  in  Section  4.2.2. 
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Decoupled  Bondesson  representation  For  each  point  process  in  the  superposition,  an  average 
of  cji  atoms  are  generated  with  independent  weights  of  the  form  Ve~Tk  where  V  ~p  g  and  Tk  ~p 
Gam  (A;,  £).  Therefore,  the  tail  measure  is 

OO 

=  7  E  9k,d9)d9, 

**  k=K+ 1 

where  gk£  is  the  density  of  Ve  T'\  The  bound  for  the  decoupled  Bondesson  representation  can 
therefore  be  expressed  as 


OO 

Bn,k<NC-  Y,  E[l-7r(ye-Tfc)]. 

^  k=K+ 1 


Example  4.3.5  (Decoupled  Bondesson  representation  truncation  for  TP(7,  A,  0)).  Using  the  fact 
that  1  —  e~6  <  6,  we  have 


Bn,k  — 


^  £  E[1 

k=K+ 1 
oo 

E 


e 

iV7  A 


k=K+ 1 


n(Vkle  Tkl )]  < 


N'y  A 


E  Eiv^e 


— Tcil 


i+e 


=  iV7 


fc=A'+l 

K 


i  +  £ 


which  is  equivalent  (up  to  a  factor  of  2)  to  the  bound  in  Roychowdhury  and  Kulis  [2015]. 

Size-biased  representation  The  constructive  derivation  of  the  size-biased  representation  [Brod¬ 
erick  et  al.,  2017,  proof  of  Theorem  5.1]  immediately  yields 

i/£(d0)  =  n(9)Ku(d9). 


Therefore,  the  size-biased  representation  truncation  error  bound  can  be  expressed  using  the  formula 
for  rjk  from  Eq.  (4)  as 

N  r  N 

Bn,K  =  E  /  ^{9)K+n~l(l  -  7r(0))l/(d0)  =  E  rlK+n-  (18) 

n= 1 9  n=l 

Example  4.3.6  (Size-biased  representation  truncation  for  TP(7,  A,  d)).  For  d  >  0,  the  standard 
gamma  integral  yields 

9k  —  J  <9)k-\  1  -  7r(0))z/(d0)  =  EE  ((A  +  k)d  -  (A  +  k  -  l)d)  . 

The  sum  from  Eq.  (18)  is  telescoping,  so  canceling  terms, 

BNjK  <  ((A  +  K  +  N)d  -  (A  +  K)d )  ~  'yNX1~dKd~1  K  -A  oo, 

LL 
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where  the  asymptotic  result  follows  from  Lemma  A.  1.4.  To  analyze  the  d  —  0  case,  we  use 
L’ Hospital’s  rule  to  take  the  limit  of  the  integral: 


lim  /  n(6)k  1(1  —  n(8))v(d8)  =  7A  (log(A  +  k)  —  log(A  +  k  —  1)) . 
d^oJ 

Canceling  terms  in  the  telescopic  sum  yields 

Bn.k  <  7A  (log(A  +  K  +  N)  —  log  (A  +  AT))  ~  yAAAA-1  AT  -A  00, 
where  the  asymptotic  result  follows  from  an  application  of  Lemma  A.  1.4. 

Power-law  representation  For  each  point  process  in  the  superposition,  an  average  of  7  atoms 
are  generated  with  independent  weights  of  the  form  VUk  n*=i  (1  —  Ut),  where  V  '~p  g  and  Ue  "~P 
Beta(l  —  d.  a  +  (:d).  Therefore,  the  tail  measure  is 


OO 

*4( de)  =  7  9k(8)d8 , 

k=K+ 1 


where  g^  is  the  density  of  the  random  variable  VUk  Y\j=\  —  C4)-  The  truncation  error  bound  may 

be  expressed  as 


OO 

BNik  <  N'f  Y\  E 

k=K+ 1 


fc-1 


7 r 


VUkUd 


i=\ 


Example  4.3.7  (Power-law  representation  truncation  for  TP(7,  A,  d)).  Let  !3k  be  a  random  variable 
with  density  gk  (with  A  in  the  place  of  a).  Using  1  —  e~e  <  8,  we  have 


E[1  -  7r(/3fc)]  <  E\J3k]  =  E 


k=K+ 1 


k=I<+ 1 


E  a 


,k=K+ 1 


K 

n 

k= 1 


A  T  kd 


A  T  kd  —  d  T  1 


where  the  final  equality  follows  from  Ishwaran  and  James  [2001,  Theorem  1],  Thus, 

K 

bN)K  <  7^  n 

k= 1 

where  the  0  <  d  <  1  case  in  Eq.  (19)  follows  by  Lemma  A.  1.5  applied  to 


A  T  kd 


A  T  kd  —  d  -f- 1 


7iV 


f  tzt) 


K 


K 


OO, 


(19) 


n 


A  kd 

A  T  kd  —  d  1 


T((A  +  l)/d)  T(\/d  +  K  +  1) 
T((A  +  d)/d)  T(A /d  +  K  +  d-1) ' 
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3.3.3  Stochastic  mapping 

We  now  show  how  truncation  bounds  developed  elsewhere  in  this  paper  can  be  applied  to  CRM 
representations  that  have  been  transformed  using  Lemma  4.2.4.  For  0  ~  CRM(zz),  we  denote  its 
transformation  by  0  =  /c(0).  For  any  object  defined  with  respect  to  0,  the  corresponding  object 
for  0  is  denoted  with  a  tilde.  For  example,  in  place  of  N  and  Xi-N  (for  0),  we  use  N  and 

(for  0).  We  make  BNK  a  function  of  "(0)  in  the  notation  of  Proposition  4.3.5;  when  one  applies 
stochastic  mapping  to  a  CRM,  one  usually  also  wants  to  change  the  likelihood  h{x  \  9),  and  thus 
also  changes  n(9)  =  h( 0  |  6).  The  proof  of  Proposition  4.3.5  may  be  found  in  Appendix  A. 3. 

Proposition  4.3.5  (Truncation  error  under  a  stochastic  mapping).  Consider  a  representation  for 

0  ~  CRM(zz)  with  truncation  error  bound  Bnk(ti).  Then  for  any  likelihood  h(x  \  u ),  if  0  is 

a  stochastic  mapping  of  0  under  the.  probability  kernel  k(9,  (hi),  its  truncation  error  bound  is 
,k{kK!n),  where  tt k^{9)  :  =  (j0/|  w)iVA=c(6>,  d u). 

3.3.4  Hyperpriors 

In  practice,  prior  distributions  are  often  placed  on  the  hyperparameters  of  the  CRM  rate  measure 
(i.e.  7,  a.  A,  d,  etc.).  We  conclude  our  investigation  of  CRM  truncation  error  by  showing  how 
bounds  developed  in  this  section  can  be  modified  to  account  for  the  use  of  hyperpriors.  Note  that 
we  make  the  dependence  of  B^k  on  the  hyperparameters  $  explicit  in  the  notation  of  Proposi¬ 
tion  4.3.6. 

Proposition  4.3.6  (CRM  truncation  error  with  a  hyperprior).  Given  hyperparameters  <F,  consider 
a  representation  for  0  |  $  ~  CRM(zz),  and  let  BN  K{(¥)  be  given  by  Eq.  (7)  (for  a  series  repre¬ 
sentation)  or  Eq.  (17)  (for  a  superposition  representation).  The  error  of  approximating  0  with  its 
truncation  0*-  satisfies 


e[Sjv,k (<£>)]  i 


0  <  -||pat,oo  —  Pn,k\\i  <  1  —  e 


Example  4.3.8  (Decoupled  Bondesson  representation  truncation  for  TP(7,  A,  0)).  A  standard 
choice  of  hyperprior  for  the  mass  7  is  a  gamma  distribution,  i.e.  7  ~  Gam  (a,  b).  Combining 
Proposition  4.3.6  and  Example  4.3.5,  we  have  that 


3.4  Normalized  truncation  analysis 

In  this  section,  we  provide  truncation  error  bounds  for  normalized  CRMs  (NCRMs).  Examples  in¬ 
clude  the  Dirichlet  process  [Ferguson,  1973],  the  normalized  gamma  process  [Brix,  1999,  James, 
2002,  Lijoi  and  Prunster,  2003,  Pitman,  2003,  Lijoi  et  al.,  2007,  Lijoi  and  Prunster,  2010],  and 
the  normalized  a-stable  process  [Kingman,  1975,  Lijoi  and  Prunster,  2010].  Given  a  CRM  0 
on  T,  we  define  t  h  e  c  o  responding  N  C  RM  S  v  i  a  S  (  S)  :  =  0(S')/0('T)foreach  measurable 
subset  S  C  T.  Likewise,  given  a  truncated  CRM  0  K ,  we  define  its  normalization  E^-via 
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EK(S)  :=  @^(S')/0x('I/)-  Note  that  any  simulation  algorithm  for  0K  can  be  used  for  EK  by 
simply  normalizing  the  result.  This  does  not  depend  on  the  particular  representation  of  the  CRM, 
and  thus  applies  equally  to  all  the  representations  in  Section  4.2. 

The  first  step  in  the  analysis  of  NCRM  truncations  is  to  define  their  approximation  error  in 
a  manner  similar  to  that  of  CRM  truncations.  Since  S  and  EK  are  both  normalized,  they  are 
distributions  on  T;  thus,  observations  XUN  are  generated  i.i.d.  from  5,  and  Z1:N  are  generated 
i.i.d.  from  EK.  Yi-n  and  W have  the  same  definition  as  for  CRMs.  As  in  the  developments 
of  Section  4.3,  the  theoretical  results  of  this  section  rely  on  a  general  upper  bound,  provided  by 
Lemma  4.4.1.  Proposition  4.4.2  shows  that  the  bound  in  Lemma  4.4.1  is  tight  without  further 
assumptions  on  the  data  likelihood  /. 

Lemma  4.4.1  (NCRM  protobound).  Let  0  ~  CRM(z/),  and  let  its  truncation  be  0 k-  Let  their 
normalizations  be  S  and  Ek  respectively.  If 

\r  I  i — i  i  ‘  d-  i — i  ry  I  '  ‘‘.d.  „ 

An  |  ~  Ai  |  ~ 

Yn  |  Xn  /(•  |  Xn),  Wn  |  ^  /(■  |  Zn), 


then 

^Ibiv,oo  ~PN,k\\i  <  1  -  P(A"i:JV  Q  SUpp(S^))  , 

where  Pn,oo,  Pn,k  are  the  marginal  densities  of  Yi-n  and  W\:n,  respectively. 

Proposition  4.4.2  (Protobound  tightness).  If  G  is  non-atomic  and  T  is  Borel,  then  for  any  5  >  0, 
there  exists  a  likelihood  f  such  that  Lemma  4.4.1  is  tight  up  to  a  factor  of  1  —  5. 


The  analysis  of  CRMs  in  Section  4.3  relied  heavily  on  the  Poisson  process  stucture  of  the  rates 
in  0  and  X1:N;  unfortunately,  the  rates  in  5  do  not  possess  the  same  structure  and  thus  lack  many 
useful  independence  properties  (the  rates  must  sum  to  one).  Likewise,  sampling  Xn  for  each  n 
does  not  depend  on  the  atoms  of  S  independently  (Xn  randomly  selects  a  single  atom  based  on 
their  rates).  Rather  than  using  the  basic  definitions  of  the  above  random  quantities  to  derive  an 
error  bound,  we  decouple  the  atoms  of  S  and  X1:N  using  a  technique  from  extreme  value  theory. 
A  Gumbel  random  variable  T  with  location  fief  and  scale  cr  >  0,  denoted  T  ~  Gumbel(/i,  a), 
is  defined  by  the  cumulative  distribution  function  and  corresponding  density 


P(T  <t)=  e~e  ^ 


and 


cr 


An  interesting  property  of  the  Gumbel  distribution  is  that  if  one  perturbs  the  log-probabilities  of  a 
finite  discrete  distribution  by  i.i.d.  Gumbel(0, 1)  random  variables,  the  arg  max  of  the  resulting  set 
is  a  sample  from  the  discrete  distribution  [Gumbel,  1954,  Maddison  et  al.,  2014].  This  technique 
is  invariant  to  normalization,  as  the  arg  max  is  invariant  to  the  corresponding  constant  shift  in  the 
log-transformed  space.  For  present  purposes,  we  develop  the  infinite  extension  of  this  result: 


Lemma  4.4.3  (Infinite  Gumbel-max  sampling).  Let  (pfjfLi  be  a  collection  of  positive  numbers 
such  that  YhiPi  <  00  and  let pj  :=  yP2— .  If  (Ti)fLl  are  i.i.d.  Gumbel(0, 1)  random  variables,  then 

2-^i  Pi 

arg  maxl£N  T,  +  log  Pi  exists,  is  unique  a.s.,  and  has  distribution 


arg  max  Tt  +  log  pt 

iSN 


Categorical  y(Pj)^=1 
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The  proof  of  this  result,  along  with  the  others  in  this  section,  may  be  found  in  Appendix  A.4. 
The  utility  of  Lemma  4.4.3  is  that  it  allows  the  construction  of  S  and  X\:n  without  the  problem¬ 
atic  coupling  of  the  underlying  CRM  atoms  due  to  normalization;  rather  than  dealing  directly  with 
5,  we  log-transform  the  rates  of  0,  perturb  them  by  i.i.d.  Gumbel(0, 1)  random  variables,  and 
characterize  the  distribution  of  the  maximum  rate  in  this  process.  The  combination  of  this  distri¬ 
bution  with  Lemma  4.4.3  yields  the  key  proof  technique  used  to  develop  the  truncation  bounds 
in  Theorems  4.4.4  and  4.4.5.  The  results  presented  in  this  section  are  summarized  in  Table  1  in 
Chapter  5. 


3.4.1  Series  representations 


The  following  result  provides  a  general  truncation  error  bound  for  normalized  series  representa¬ 
tions,  specifies  its  range,  and  guarantees  that  it  decays  to  0  as  K  — >  oo .  We  again  use  the  general 

series  representation  notation  from  Eq.  (6),  where  g  is  a  distribution  on  M+,  and  r  :  M+  x  R+  — >  M+ 
is  a  measurable  function  such  that  lim^oo  r(v,  u)  =  0  for  ^-almost  every  v. 


Theorem  4.4.4  (Normalized  series  representation  truncation  error  bound).  The  error  of  approxi¬ 
mating  a  series  representation  of  3  ~  NCRM(z/)  with  its  truncation  S#  satisfies 


0  <  ~\\pN,oo  —  Pn,k\\i  <  1  —  (1  —  Bk)n  <  1, 


where 


Bk  \ —  E 


^  J(TK,t)  Qf  J(TKu,t)  dt^l  ( _^_eff°{Au+rK,t)-i)du  )  df 


J(u,t)=-E[e~t<v^],  V~g, 


d  t 
and 


(20) 


T K  ~  Gam (K,  1). 


Furthermore,  liniA'-s-oo  Bk  =  0. 


Example  4.4.1  (Dirichlet  process,  DP (7),  B-Rep).  The  Dirichlet  process  with  concentration  7  > 
0  is  a  normalized  gamma  process  NTP(7, 1,0).  From  Example  4.2.2  we  have  cv  —  7  and  gu  = 
Exp(l),  and  from  Section  4.3.1  we  have  r(v,  u )  =  ve~u^Cv .  Therefore  J  and  its  antiderivative  are 


J(u,  t)  —  E 


e-tVe~u/~t 


(1  +  te~uh)  1  and 


j  J(u,t)du  =  7  log  {eu^  +  t)  . 


Using  the  antiderivative  to  evaluate  the  integrals  in  the  formula  for  Bk,  writing  the  expectation 
over  T k  ~  Gam(iC,  1)  explicitly,  and  making  a  change  of  variables  we  have 


where  the  last  equality  is  found  by  multiplying  and  dividing  the  integrand  by  (1  +  /;)~(7+2-),  and 
making  the  change  of  variables  from  s  to  x  =  log  fyj.  Therefore,  the  truncation  error  can  be 
bounded  by 

I  llpyv.oo <  1  -  (^1  -  (t^t)  )  -^(771) 


22 

APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


The  bound  in  Example  4.4.2  has  exponential  decay,  and  reproduces  earlier  DP  truncation  error 
bound  rates  due  to  Ishwaran  and  James  [2001]  and  Ishwaran  and  Zarepour  [2002].  However,  the 
techniques  used  in  past  work  do  not  generalize  beyond  the  Dirichlet  process,  while  those  developed 
here  apply  to  any  NCRM. 


4.4.2  Superposition  representations 


The  following  result  provides  a  general  truncation  error  bound  for  normalized  superposition  rep¬ 
resentations,  specifies  its  range,  and  guarantees  that  it  decays  to  0  as  K  — >  oo.  We  once  again  rely 
on  the  property  that  the  truncation  0  k  and  tail  0^  are  mutually  independent  CRMs,  as  expressed 
in  Eq.  (16),  with  the  tail  measure  denoted 

Theorem  4.4.5  (Truncation  error  bound  for  normalized  superposition  representations).  The  error 
of  approximating  a  superposition  representation  ofE  ~  NCRM(z/)  with  its  truncation  S  k  satisfies 

0  <  -||pjv,oo  —  Pn,k\ |i  <  1  —  (1  —  Bk)N  <  1, 

where 

Bk  '■=  (^J Oe~0tu+( d0)^j  (21) 

Furthermore,  lim k-,oc  Bk  =  0. 


This  bound  can  be  applied  by  using  the  tail  measures  derived  earlier  in  Section  4.3.2. 

Example  4.4.2  (Dirichlet  process,  DP(y),  DB-Rep).  As  in  Example  4.4.1,  we  view  the  Dirich¬ 
let  process  with  concentration  7  >  0  as  a  normalized  gamma  process  NTP(7, 1,0).  First,  by 
Lemma  A.  1.8,  the  integral  in  the  exponential  is 

(■ e~te  -  l)i/(d0)^  =  exp  ^7  jf  ( e~te  -  1)0“  =  (t  +  l)"7. 

Example  4.2.2  shows  cv  —  7  and  gv(v)  =  e~v,  and  Eq.  (27)  provides  the  tail  measure  vf  for  the 
decoupled  Bondesson  representation, 

VkW  =  |  Y(k)  (/0  (-l°Sx)k^x^2e~dx~ldxJ  d0- 


Substituting  this  result,  using  Fubini’s  theorem  to  swap  the  order  of  integration  and  summation, 
evaluating  the  integral  over  9,  and  making  the  substitution  x  =  e~s  yields 


Bk  — 


1 

£ 


00 


E 

k=K+ 1 


e 

m 


-G-V'it + 1) 

(es  + 1)2 


-7 


ds  df. 


Noting  that  Vs  >  0,  es  >  1,  we  have  for  any  a  G  (0, 1]  D  (0, 7), 


Bk<  2 


E 


£  ^  r (k) 

s  k=K+ 1  v  ' 


sk-le-(£+a)sds 


(t  +  l)-(7+1"a)df 


7 

(7  -  a)£ 


7 

a( 7  -  a) 
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Therefore,  for  any  a  G  (0, 1]  D  (0, 7), 


1 

2 


l|PtV,oo 


—  Pn,k\\i 


<  1  - 


\  o(7  -a)  U  +  a 


2V 

A  ^  f  e  \ 

y  0(7 -a)  U +a/ 


K  — >  00. 


To  find  the  tightest  bound,  one  can  minimize  with  respect  to  a  given  7,  £,  K. 

Example  4.4.3  (Normalized  gamma  process,  NTP(7,  A,  d),  SB-Rep).  By  Lemma  A. 1.8,  the  inte¬ 
gral  in  the  exponential  is 


exp 


(e~m  ~  l)v(M)  )  =  {  (  J_v)7a 


exp  (— 7A1  dd  1  ((t  +  X)d  —  Xd))  d>  0 


d  =  0, 


(22) 


and  the  standard  gamma  integral  yields 

r  X1~d 


Oe-0tv+(dO)  =  7- 


d-de-(K+t+ x)e  de  =  7yi ~d(K  +  t  +  A) 


d- 1 


(23) 


T(l-d) 

When  d  >  0,  multiplying  the  previous  two  displays  and  integrating  over  t  >  0  yields 

poo 

Bk  =  7A1- VA/d  J  ( K  +  t)d-le-^1~dtd/d  d t  <  ClXd  ( K  +  A)d_1 , 

where  we  have  used  [K  + 1)^1  <  (A-  +  A)d_1  for  t  >  X  and  the  change  of  variables  u  = 
7A l~dd~ltd  to  find  that  C7iA,d  =  1A1_dT  (d_1,  a),  where  a  =  7A d_1  and  T(a,  x)  :  = 

Jx°°  #“_1e_6,d#  is  the  upper  incomplete  gamma  function.  Therefore, 

\d-i\N 


2  IKoo  -  Piv^ll,  <  1  -  (1  -  C’7)A)d(Ar  +  A)d_1)  ~  NClXdKd 


' d~l  K  ->  cx). 

When  d  —  0,  multiplying  Eqs.  (22)  and  (23)  and  integrating  over  t  >  0  yields 


JD  p  OO 

^  =  ,/A  (^+r1t-Adt< 


(A^  +  A)-1 

A'-1log(A±A) 


^ix(A'+A)1-'>'a-A1-t'a 


1— 7A 


7A  7^  1 
7A  =  1, 


where  we  obtain  the  bound  for  7 A  7^  1  by  splitting  the  integral  into  the  intervals  [A,  K  +  A]  and 
[. K  +  A,  00)  and  bounding  each  section  separately,  and  we  obtain  the  bound  for  7 A  =  1  via  the 
transformation  u  —  t/(K  +  t).  Therefore,  asymptotically 


1 

2 


l|PtV,oo 


Pn,k\\i 


CldK~ yA^l 
A  A'-1  log  K  7A  =  1 


K  — >  cx), 


where  C1,\  :  =  max 


ata  7A2  A 
1— 7 A  ’  7A— 1  J 


Truncation  of  the  NTP(7,  A,  d)  has  been  studied  previously:  Argiento  et  al.  [2016]  threshold 
the  weights  of  the  unnormalized  CRM  to  be  beyond  a  fixed  level  e  >  0  prior  to  normalization,  and 
develop  error  bounds  for  that  method  of  truncation.  These  results  are  not  directly  comparable  to 
those  of  the  present  work  due  to  the  different  methods  of  truncation  (i.e.  sequential  representation 
termination  versus  weight  thresholding). 
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3.4.3  Hyperpriors 

As  in  the  CRM  case,  we  can  place  priors  on  the  hyperparameters  of  the  NCRM  rate  measure 
(i.e.  7,  a.  A,  d,  etc.).  We  conclude  our  investigation  of  NCRM  truncation  error  by  showing  how 
bounds  developed  in  this  section  can  be  modified  to  account  for  h  yperpriors.  Note  that  we  make 
the  dependence  of  Bk  on  the  hyperparameters  $  explicit  in  the  notation  of  Proposition  4.4.6. 

Proposition  4.4.6  (NCRM  truncation  error  with  a  hyperprior).  Given  hyperparameters  <I\  consider 
a  representation  for  ©  |  <3>  ~  CRM(z/),  letE  |  <f>  be  its  normalization,  and  let  B k ( $ )  be  given  by 
Eq.  (20)  (for  a  series  representation )  or  Eq.  (21)  (for  a  superposition  representation).  The  error  of 
approximating  S  with  its  truncation  Ek  satisfies 

0  <  -||piv,oo  —  Pn,k\ |i  <  1  —  (1  —  E  [f?A'($)])Ar  <  1. 

Example  4.4.4  (Dirichlet  process,  DP (7),  B-Rep).  If  we  place  a  Lomax  prior  on  7,  i.e.  7  ~ 
LomP(a,  1),  then  combining  Proposition  4.4.6  and  Example  4.4.1  yields 

\  IIpw.oo  -  PN.KW,  <  1  -  (1  -  r(r(l  +  ^(  +  ^)1))  ~  NT(a  +  ')(*  +  1)"a  K  -+  oo- 

3.5  Simulation  and  computational  complexity 

The  sequential  representations  in  Section  4.2  are  each  generated  from  a  different  finite  sequence 
of  distributions,  resulting  in  a  different  expected  computational  cost  for  the  same  truncation  level. 
Thus,  the  truncation  level  itself  is  not  an  appropriate  parameter  with  which  to  compare  the  error 
bounds  for  different  representations  and  we  require  a  characterization  of  the  computational  cost. 

We  investigate  the  mean  complexity  E  [R\  of  each  representation,  where  R  is  the  number  of  ran¬ 
dom  variables  sampled,  as  a  function  of  the  truncation  level  for  each  of  the  representations  in 
Section  4.2. 

We  begin  with  the  series  representations.  For  each  value  of  k  =  1, . . . ,  K,  each  series  rep¬ 
resentation  generates  a  single  trait  f>k  ~  G  and  a  rate  Ok  composed  of  some  transformation  of 

random  variables.  Thus,  all  of  the  series  representations  in  this  work  satisfy  E [R\  =  rK  for  some 
constant  r:  by  inspection,  the  inverse-Levy  representation  has  r  =  2,  and  all  the  remaining  series 
representations  have  r  =  3. 

The  superposition  representations,  on  the  other  hand,  generate  a  Poisson  random  variable  to 
determine  the  number  of  atoms  at  each  value  of  k  =  1, ....  K,  and  then  generate  those  atoms. 

Therefore,  the  mean  simulation  complexity  takes  the  form  E [R\  =  zL/.-=i  1  +  rtM\Ci,)  for  some 
constants  rk  that  might  depend  on  the  value  of  k.  For  the  decoupled  Bondesson  representation, 

rk  =  3  since  each  atom  requires  generating  three  values  (ipki,  Vki,  and  Tki),  and  E [Ck\  =  c/£, 
so  E  [R]  =  +  lj  K .  For  the  size -biased  representation,  rk  =  3  since  each  atom  requires 

generating  three  values  (ifki,  zki,  and  0ki),  and  E|G):]  =  qk,  so  E [R\  =  K  +  3  Y^k=\  Vk-  Note  that 
here  E [R\  ~  K,  for  K  — *  oc  since  qk  is  a  decreasing  sequence.  For  the  power-law  representation, 
rk  =  k+ 2,  since  each  atomrequires  generating  'ipki,  Vk i,  and  k  beta  random  variables,  and  therefore 
E [R\  =  (1  +  f )  K  +  IK2. 
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Figure  1:  Truncation  error  bounds  for  the  beta-Bemoulli  process. 


3.6  Additional  Examples 


In  this  section,  we  provide  example  applications  of  our  theory  to  the  beta  process  and  to  the  beta 
prime  process. 


3.6.1  Beta  process 

The  beta  process  BP (7,  a,  d)  [Teh  and  Goriir,  2009,  Broderick  et  al.,  2012]  with  discount  param¬ 
eter  d  G  [0, 1),  concentration  parameter  a  >  —d,  and  mass  parameter  7  >  0,  is  a  CRM  with  rate 
measure 


^  =  'ni-ma+i,1 10  s  11  r‘"J(1  -  0)a+d~lAe ■  (24) 

Setting  <2  =  0  yields  the  standard  beta  process  [Hjort,  1990,  Thibaux  and  Jordan,  2007].  The  beta 
process  is  often  paired  with  a  Bernoulli  likelihood  or  negative  binomial  likelihood  with  s  G  N 
failures: 


Bern:  h(x  \  9)  =  1  [x  <  1]  9x(l  -  0)1"*, 

(1  -eyex. 

Note  that  for  the  Bernoulli  likelihood  ir(9)  =  1  —  9  and  for  the  negative  binomial  likelihood 

n(9)  =  (1  -  9)°. 

Bondesson  representation  If  a  >  1  and  <2  =  0,  then  9u(9)  =  70(1  —  9)a  [9  <  1]  is  non¬ 

increasing,  cv  =  lim^o  9v(9)  =  7 a,  and  gu(v)  =  (a  —  1)(1  —  v)a~2  =  Beta(u;  1,  a  —  1).  Thus, 
it  follows  from  Theorem  4.2.1  that  if  0  B-Rep(7cc,  Beta(l,  a  —  1)),  then  0  ~  BP (7,  a,  0).  In 
the  case  of  a  =  1,  gu{v)  =  0 ,  so  Vk  =  1  and  the  Bondesson  representation  is  equivalent  to  the 
inverse-Levy  representation.  Since  exp(— Ek/c)  ~  Beta(l,  c),  the  representation  used  in  Teh  et  al. 
[2007]  is  equivalent  to  the  Bondesson  representation  for  BP(7,  1,  0). 


NegBinom: 


h(x  |  9)  = 


x  +  s 
x 
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To  obtain  a  truncation  bound  in  the  Bernoulli  likelihood  case,  we  can  argue  as  in  Example  4.3.2: 


Bn,k  <N  (1  -  E  [it(ve-GK,{-ia))] )  v{ dv) 

Jo 

=  N^aE[e~GK/M]  [  (1  -  v)a~ldv 


=  N'y 


7  a 


1  + 


K 


(25) 


This  result  generalizes  that  in  Doshi- Velez  et  al.  [2009],  which  applies  only  when  a  =  1. 

Theorem  4.2.1  does  not  apply  directly  when  a  <  1,  since  lirn^o  9v(0)  =  oo.  However,  a 
representation  can  be  obtained  by  using  a  trick  from  Paisley  et  al.  [2012].  For  a  >  0,  let 


c 


e'  =  £ 


k= 1 


where  C  ~  Poiss(7),  Q'k  ~  Beta(l,a),  and  y'k  ~  G.  Thus,  0'  is  a  CRM  with  rate  measure 
7«(1  —  d)a~lt  [9  <  l]d0.  If  0  ~  BP(7cc/ ( a  +  1),  a  +  1,  0),  which  can  be  generated  according  to 
Theorem  4.2.1,  then  0"  =  0  +  0'  is  a  CRM  with  rate  density  on  [0, 1]  given  by 

7a0_1(l  -  0)a  +  7«(1  -  6)a-1  =  7a0_1(l  -  0)Q_1, 

hence  0"  ~  BP(7,  cc,  0). 

Thinning  representation  If  we  let  g  =  Beta(l  —  d,  a  +  d),  then  the  thinning  representation  for 

BP(7,  A,  d)  is 


0  =  ^  Vkt(VkTk  <  7 )<fyk,  with  14  ~  Beta(l  -d,a  +  d). 

k=  1 


Rejection  representation  To  obtain  a  rejection  representation  for  any  d  when  a  >  1  —  d,  let  n 
be  the  rate  measure  for  BP  (7  ,  1  —  d,d).  We  then  have  that 


g[x,  00 ) 


7  'd  1(x  d  —  1)  d  >  0 
—7'  logx  d  =  0 


and  /P“(w) 


(1  +  du/y)  1/d  d  >  0 

e-u/7'  d  =  0, 


where  7'  :=  7f(ir^f^+d)-  Thus,  we  can  apply  the  inverse-Levy  method  analytically  for  //.  Since 
we  have  constructed  /1  such  that  di//d/r  <  1,  we  can  use  fi  to  construct  the  rejection  representation 


0  =  ^  141  (Uk  <  (1  -  Vk)a+d~l )  ~  BP(7,  a,d),  a  >  1  —  d, 

k=  1 


with 


(1  +  drJfc/7,)-1/d  d  >  0 

e-rfc/7'  d  =  0  ’ 


~  U nif [0, 1]. 
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The  expected  number  of  rejections  is 


-d 


-i 


1  +  2 d2F1(0’°’1’0)(— a  -  d,  -d,  -d; 1)  +  d2F^’u’U)(-a  -  d,  -d,  -d;  1) 


(0,1, 0,0)  , 


where  2F±  is  the  ordinary  hypergeometric  function  and  the  parenthetical  superscripts  indicate  par¬ 
tial  derivatives.  This  quantity  monotonically  diverges  to  oc  as  d  — »  1. 

To  obtain  a  truncation  bound  in  the  Bernoulli  likelihood  case,  we  consider  the  d  >  0  and  d  =  0 
settings  separately.  If  d  >  0,  we  have 


<  [  FKh'd~1(x-d  -  l))x~d(l  -  x)a+d-\\x 

Ni  Jo 

x~d(l  -  x)a+d~ Mx  +  FK{r/dr\ari  -  1))  [  x~d(l  -  x^+^dx 

J  a 

<  [  x~ddx  +  a-dFK^'d-\a-d  -  1))  [  (1  -  x^+^dx 

Jo  Ja 

<  (1  -  d)"1^  +  a~d  (3yd-\a~d  -  l)/K)K 

<  (1  -  d)-V~d  +  (3 ^'d~l/K)K  a~{K+1)d. 


Setting  the  two  terms  equal  and  solving  for  a  we  obtain  a1+d(K  ^  =  (3y '(d  1  —  1)/K)K  and 
conclude  that 


Bn, k  <  2N1' 


3 i(d~l  -  1)  \  i+^-D 
K 


2iY7' 


,  (3i(d~l  -  l)\1/d 
K 


K  — >■  oo. 


If  d  =  0,  we  have 


Bn,i< 


—  <  /  Fk(— 7,loga:)(l  —  x)a  ldx 


<  /  (1  —  x)a  Ma;  +  Fk{— 7' log  a)  /  (1  —  x)a  1dx 


<  a  +  (—37'  log  a/ K ) 


K 


Setting  the  two  terms  equal  and  solving  for  a  we  conclude  that 

Bn,k  <  2N'yae~KWo({3ia}~1) , 


where  Wq  is  as  defined  in  Eq.  (11). 


Decoupled  Bondesson  and  power-law  representations  The  decoupled  Bondesson  representa¬ 
tion  for  BP(7,  a,  0)  from  Paisley  et  al.  [2010]  was  extended  by  Broderick  et  al.  [2012]  to  the 
BP(7,  a,  d)  setting.  The  Broderick  et  al.  [2012]  construction  for  the  BP(7,  a,  d)  is  in  fact  the 
“trivial”  power-law  representation  PL-Rep(7,  a,  d,  di)  (the  decoupled  Bondesson  representation 
is  the  special  case  when  d  =  0). 

In  the  Bernoulli  likelihood  case,  the  truncation  bound  for  the  decoupled  Bondesson  represen¬ 
tation  is 


Bn,k  < 


N"fa 

~T 


Y  E[Ve~Tk] 

k=K+ 1 
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For  the  power  law  representation,  by  the  same  arguments  as  Example  4.3.7, 


K 


B 


'N,K  < 


v  j] 


a  +  kd 


It! 


This  result  generalizes  that  in  Paisley  et  al.  [2012],  which  applies  only  when  d  =  0. 


Size-biased  representation  The  size-biased  representation  of  the  beta  process  is  well- 
established  and  we  refer  the  reader  to  Broderick  et  al.  [2012,  2017]  for  details.  We  note  that 
the  standard  beta  integral  yields 


Vk  Vkl 


—  7 r(6)))z/(d6))  =  7 


T(a  +  1)  r(a;  +  d  +  k  —  1) 
T(a  +  d)  T(a  +  k ) 


and  for  i  >  1,  rjki  =  0.  Hence  Zki  =  1  almost  surely  and  6ki  ~  Beta(l  —  d,  a  +  d  +  k  —  1), 
demonstrating  that  the  construction  due  to  Thibaux  and  Jordan  [2007]  is  a  special  case  of  the 
size-biased  representation  for  BP (7,  a,  d). 

To  obtain  a  truncation  bound  for  the  Bernoulli  likelihood  case,  first  consider  the  d  >  0  setting. 
Using  Lemma  A. 1.6  to  simplify  the  sum  in  Eq.  (18),  we  have 


B 


N,K  <  “ 


7  T(a  +  1) 
d  T(a  +  d) 
r(a  +  1) 


7  N 


T(a  +  d) 


' T(a  +  d  +  K )  T(a  +  d  +  K  +  N)\ 

K  r(a  +  K)  r(a  +  K  +  N)  J 

Kd~  l  K  —f  oc, 


where  the  asymptotic  result  follows  from  Lemmas  A.  1.4,  A.  1.7  and  A.  1.9.  When  d  =  0,  we  can 
again  use  Lemma  A.  1.6  to  arrive  at 

Bn,k  <  7 01  (^(a  +  K )  —  ijj(a  +  N  +  K))  ~  ,yaNK~1  K  — >  00, 


where  V’(-)  is  the  digamma  function,  and  the  asymptotic  result  follows  from  Lemma  A.  1.4. 

We  can  also  bound  the  truncation  error  in  the  case  of  the  negative  binomial  likelihood.  For  a 
fixed  number  of  failures  s  >  0,  and  assuming  a  +  d  +  (k  —  l)s  >  1,  integration  by  parts  yields 


J  7r(0)fc_1(l  -7r(0))i/(d 9) 

_  7  r(a  +  1)  /T(a  +  d  +  ks)  T(a  +  d  +  (k  -  l)s)\ 

d  r(a  +  d)  \  r(a  +  ks)  r(a  +  (k  —  l)s)  J 

When  d  >  0,  the  sum  from  Eq.  (18)  is  telescoping,  so  canceling  terms, 

^  7r(a+l)  fr(a  +  d+Ks)  T{a  +  d+  (K  +  N)s)\ 

N’K  ~  ~d  r(a  +  d)  V  r(a  +  Ks)  T(a  +  (K  +  N)s)  J 

~  K  ->  CO, 

r  (cr  T  d) 

where  the  asymptotic  result  follows  from  Lemmas  A.  1.4,  A.  1.7  and  A.  1.9.  To  analyze  the  case 
where  d  =  0,  we  can  use  L’Hospital’s  rule  to  take  the  limit  of  Eq.  (26)  as  d  — *  0,  yielding 

lim  [  7r(6))fc_1(l  —  7r(0))i/(d6l)  =  ya(^(a  +  ks)  —  ip(a  +  (k  —  l)s)). 
d^o  I 
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Again  computing  the  error  bound  by  canceling  terms  in  the  telescoping  sum, 

Bn,k  <  7 ol  +  Ks )  —  ijj(a  +  (K  +  N)s ))  ~  7 aNK _1  K  — *  00, 

where  the  asymptotic  result  follows  from  an  application  of  Lemma  A.  1.4. 

Stochastic  mapping  We  can  transform  the  gamma  process  TP (7,  A,  0)  into  the  beta  process 
BP (7,  a,  0)  by  applying  the  stochastic  mapping 

9  1— »  9/(9  +  G),  G  ~  Gam(a,  a),  K(9,du)  =  Gam(9/(9  +  u);a,a)—du. 

u2 

Using  Lemma  4.2.4  yields  k(0)  ~  BP (7,  a,  0).  Applying  this  result  to  the  Bondesson  representa¬ 
tion  for  rP(7,  a,  0)  yields 

OO 

BP(7,a,0),  with  :=  (1  +  GJfcyfc-1er^)-1,  Va  G, 

fc=i 

Gfc  ~  Gam(a,  a),  14  ~  Exp(a). 

which,  unlike  the  Bondesson  representation,  applies  for  all  a  >  0. 

Hyperpriors  Consider  truncating  the  Bondesson  representation  of  the  beta  process,  but  with  a 
hyperprior  on  the  mass  parameter  7.  A  standard  choice  of  hyperprior  for  7  is  a  gamma  distribution, 
i.e.  7  ~  Gam(a, 6).  Combining  Proposition  4.3.6  and  the  beta-Bernoulli  truncation  bound  in 
Eq.  (25),  we  have  that 


Bn,k  <  N- 
b 


I< 


3.6.2  Beta  prime  process 

The  beta  prime  process  BPP(7,  a,  d)  [Broderick  et  al.,  2017]  with  discount  parameter  d  6  [0, 1), 
concentration  parameter  a  >  —d,  and  mass  parameter  7  >  0,  is  a  CRM  with  rate  measure 


v(d9)  =  7 


r(«  +  1) 


+  6)-Qd6. 


T(1  -  d)T{a  +  d) 


The  beta  prime  process  is  often  paired  with  an  odds  Bernoulli  likelihood, 


h(x\6)  =  l[x<l]  0X(1  +  0)~1, 


in  which  case  7 r(0)  =  (1  +  6)  1 .  All  truncation  results  are  for  the  odds  Bernoulli  likelihood. 

Bondesson  representation  If  d  —  0,  then  9u(6)  =  70(1  +  is  non-increasing  and 
cu  =  lim^o  9 1/(9)  =  7 a,  so  gv(v)  =  a{  1  +  v)~a~l  =  Beta'(u;  1,  a).  Thus,  it  follows  from 
Theorem  4.2.1  that  if  0  B-Rep(7«,  Beta'(l,  a)),  then  0  ~  BPP(7,  a,  0).  For  the  truncation 
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Figure  2:  Truncation  error  bounds  for  the  beta  prime-odds  Bernoulli  process. 


bound  we  have 


BNiK  =  N  (1  -  E  [7T(ve~GK/^a))])  u(dv) 


=  N'ya  E 


e-Gx/(7«)  /  (l  +  V)-«(l  +  We-GK/(7a))-ldv 


<  N-faE[e-GK/^a)}  /  (1  +  n)'Q(l  +  vE[e~GK/{-ia)])-ldv 

Jo 

K  ,oo 


<  N 7a 


=  iV7 


7  a 


1  +  7ct 


(1  +  u)_Q_1du 


7a 


1  +  7ct 


K 


where  the  first  upper  bound  follows  from  Jensen’s  inequality.  Thus,  the  error  bound  is  the  same  as 
for  the  beta-Bernoulli  process. 

Thinning  representation  If  we  let  g  =  Beta'fl  —  <2,  a  +  d),  then  the  thinning  representation  for 

BPP(7,  a,  d )  is 


0  =  ]Tl41(f4(rfc-7)  <7 )<W  with  Vk  ~  Beta'(l  -  d,  a  +  d). 
k= i 

Rejection  representation  For  <2  =  0  and  a  >  1,  we  take  /<  to  be  the  rate  measure  for 
LomP(7, 1),  so  the  rejection  representation  is 


B  =  ^  141(24  <  (1  +  Vu)0-1)  ~  BPP(7,  a,  0),  a  >  1, 

fc=i 

with  Vk  =  (e7'lQ'lrfc  -  I)-1,  Uk  ~  U n if [0, 1], 

The  expected  number  of  rejections  is  c7  +4  (a),  where  c7  is  the  Euler-Mascheroni  constant  andp 
is  the  digamma  function.  Sinccp  (a)  ~  log(a)  for  a  — >•  oo,  the  representation  remains  efficient 
even  for  fairly  large  values  of  a.  To  obtain  a  truncation  bound,  we  use  the  same  approach  as  in 
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Example  4.3.3: 

Bn,k 

N'ya 


Fx(7Qilog(l  +  x  ))(l  +  x)  "  dx 


<  (1  +  x)  a  dz  +  FK(^a  log(l  +  a  ))  /  (1  +  x) 


\  —a—  1 


dx 


<a  +  a  1(37alog(l  +  a  x)  /  K)k  (l  +  a)  a 
~  e~b  +  a~1(3rya/ K)KbR , 

where  b  :=  log(l  +  a-1).  Setting  the  two  terms  equal  and  solving  for  b,  we  conclude  that 

Bn,k  <  2N^ae~KWo[{^a}~1\ 

where  W0  is  the  product  log  function,  as  defined  in  Eq.  (11). 

Similarly  to  Example  4.2.5,  for  the  case  of  d  >  0  and  a  >  0,  we  instead  use  fi(c\6)  = 
Since  /^( u )  =  (Ym-1)1^,  where  7'  :=  7dr(i-a)r(l+ri)’ the  rejection  rep- 


'  r(l-d)r(a+d) 

resentation  is 


0  =  X]  Vkt  (Uk  <  (1  +  Vk)~a)S 


with 


fc=i 


Vi  =  (T,r(71)1/d, 

Vi  ~  Unif [0, 1], 


The  expected  number  of  rejections  is  so  the  representation  is  efficient  for  large  d.  but  extremely 
inefficient  when  d  is  small.  We  have 

r(<a  +  1) 


Bn,k  =  N  7- 


FK{ix-d)x~d{  1  + 


T(1  —  d)T(a  +  d )  70 

Following  the  approach  of  Example  4.3.4,  the  integral  can  be  upper  bounded  as 


\  — 1  — Q 


/  x  ddx  +  FK(ry'a  )  x  d(l+x) 

JO  Ja 

<  (1  —  d)~laJ^d  +  a~1a^d(3r)' K~la~d)K 

Setting  the  two  terms  equal  and  solving  for  a,  we  obtain 

7T(a  +  1) 

VI)) 

-l/d 


dx 


Bn,k  <  2Ar 
~  2N 


r(l  -d)T(a  +  d)  J 
7r(a  +  1)  '  1+1/d 

r(2  —  d)T(a  +  d) 


(d+l)K+l 

dK+i  ,  f  dK 

(a(l-d)dK)~^hv  (  EfL 


K 

'  dK+ 1 


dK 


3(1  -dy 


K  — >  00. 


Decoupled  Bondesson  representation  It  follows  from  Theorem  4.2.3  and  the  same  arguments 
as  those  in  the  Bondesson  case  that  if  0  A-  DB-Rep(7cc,  Beta'(l,  a),  £),  then  0  ~  BPP(7,  a,  0). 
Using  the  trivial  bound  6/(1  +  6)  <9  and  calculations  analogous  to  those  in  the  beta-Bemoulli 
case,  for  a  >  1  we  obtain  the  upper  bound 


B  n.k  <  1  —  exp 


-N- 


7  a 


a 


1  vi+e 


K 


N- 


7  a 


a 


1  vi+e 


K 


K  — »  00. 
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Power-law  representation  We  can  transform  the  gamma  process  TP (7, 1,  d)  into  the  beta  prime 
process  BPP(7,  a,  d)  by  applying  the  stochastic  mapping 

Q 

9h^u  =  9/G:  G  ~  Gam(a  +  d,  1),  k(9,  du)  —  Gam(6/u;  a  +  d,  1)—  du. 

u2 

Using  Lemma  4.2.4  yields  /c(@)  ~  BPP(7,  a ,  d).  Applying  this  result  to  the  power-law  represen¬ 
tation  rP(7, 1,  d)  from  Example  4.2.8  yields  the  novel  power-law  representation 

0  A-  PL-Rep(7«,  1,  d,  Beta'(l,  a  +  d))  implies  0  ~  BPP(7,  a,  d). 


Using  the  trivial  bound  9/(1  +  9)  <  9  and  calculations  analogous  to  those  in  the  beta-Bernouli 
case,  for  a  >  1  we  obtain  the  upper  bound  and  asymptotic  simplification 


Bn,k  <  N 


7  a 

a  —  1 


n 


1  +  kd 

2  kd  —  d 


7 Na 

a  —  1 


2  ~K 


d  =  0 
0  <  d  <  1 


Size-biased  representation  We  have 


K  — >  00. 


Vk!  =Vk=  tt (9)k  X(1  -  -K(9))v(d9)  =  7 


r(a  +  l)r(d  +  k  +  a  —  1) 
r(a  +  d)F(k  +  a) 


which  is  the  same  as  for  the  beta-Bemoulli  process.  Thus,  the  error  bound  is  also  the  same  as  the 
beta-Bernoulli  case. 
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Results  and  Discussion 


Table  1  summarizes  our  truncation  and  simulation  cost  results  as  applied  to  the  beta,  (normalized) 
gamma,  and  beta  prime  processes.  Results  for  the  Bondesson  representation  of  BP(7, 1,0)  as 
well  as  the  decoupled  Bondesson  representations  of  BP(7,  A,  0)  and  TP (7,  A,  0)  were  previously 
known,  and  are  reproduced  by  our  results.  All  other  results  in  the  table  are  novel  to  the  best  of 
the  authors’  knowledge.  It  is  interesting  to  note  that  the  bounds  and  expected  costs  within  each  of 
the  representation  classes  often  have  the  same  form,  aside  from  some  constants.  Across  classes, 
however,  they  vary  significantly,  indicating  that  the  chosen  sequential  representation  of  a  process 
has  more  of  an  influence  on  the  truncation  error  than  the  process  itself. 

Fig.  3  shows  a  comparison  of  how  the  truncation  error  bounds  vary  with  the  expected  com¬ 
putational  cost  E [R]  of  simulation  for  the  (normalized)  gamma  process  and  Poisson  likelihood 
with  N  =  5  observations.  Results  shown  for  the  thinning,  rejection,  and  inverse-Levy  represen¬ 
tations  are  computed  by  Monte-Carlo  approximation  of  the  formula  for  Bn,k  in  Eq.  (8),  while 
all  others  use  closed-form  expressions  from  the  examples  in  Sections  4.3  and  4.4.  Note  that  the 
Bondesson  and  decoupled  Bondesson  representations  do  not  exist  when  d  >  0.  Further,  only  those 
representations  for  which  we  provide  closed-form  bounds  in  the  examples  are  shown  for  the  nor¬ 
malized  gamma  process;  we  leave  the  numerical  approximation  of  the  results  from  Theorems  4.4.4 
and  4.4.5  as  an  open  problem.  Similar  figures  for  other  processes  (in  particular,  the  beta-Bemoulli 
and  beta  prime-odds  Bernoulli)  are  provided  in  Section  4.6.  Note  that  all  bounds  presented  are 
improved  by  a  factor  of  two  versus  comparable  past  results  in  the  literature,  due  to  the  reliance  on 
Lemmas  4.3.1  and  4.4.1  rather  than  the  earlier  bound  found  in  Ishwaran  and  James  [2001]. 

In  Fig.  3,  the  top  row  shows  results  for  the  light-tailed  process  (7  =  1,  A  =  1,  d  =  0,  and 
£  =  c  =  7A).  All  representations  except  for  thinning  and  size -biased  capture  its  exponential 
truncation  error  decay.  This  is  due  to  the  fact  that  the  thinning  representation  generates  increasingly 
many  atoms  of  weight  0  as  K  — >  00,  and  the  expected  number  of  atoms  at  each  outer  index 
for  the  size-biased  representation  decays  as  K  — »  00.  The  inverse-Levy  representation  has  the 
lowest  truncation  error  as  expected,  as  it  is  the  only  representation  that  generates  a  nonincreasing 
sequence  of  weights  (and  so  must  be  the  most  efficient  [Arbel  and  Priinster,  2017]).  Based  on  this 
figure  and  those  in  Section  4.6  for  other  processes,  it  appears  that  the  Bondesson  representation 
typically  provides  the  best  tradeoff  between  simplicity  and  efficiency,  and  should  be  used  whenever 
its  conditions  in  Theorem  4.2.1  are  satisfied.  When  the  technical  conditions  are  not  satisfied, 
the  rejection  representation  is  a  good  alternative.  If  ease  of  theoretical  analysis  is  a  concern,  the 
decoupled  Bondesson  representation  provides  comparable  efficiency  with  the  analytical  simplicity 
of  a  superposition  representation. 

The  bottom  two  rows  of  Fig.  3  show  results  for  the  heavy-tailed  process  (7  =  1,  A  =  2, 
and  d  <6  (0.1,  0.5}).  The  representation  options  are  more  limited,  as  the  technical  conditions 


34 

APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


rp  Nrp 

Figure  3:  Truncation  error  bounds  for  representations  of  the  (normalized)  gamma-Poisson  process, 
with  7  =  1,  A  =  2,  and  £  =  7A.  The  left  column  is  for  the  unnormalized  process,  while  the 
right  column  is  for  the  normalized  process.  Each  row  displays  results  for  a  different  value  of  the 
discount  parameter  d  6  {0,  0.1,  0.5}. 
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Table  1:  Asymptotic  error  bounds  and  simulation  cost  summary.  Error  bounds  are  presented  up  to 
a  constant  that  varies  between  models.  Be  =  Bernoulli,  OBe  =  odds  Bernoulli,  Poi  =  Poisson. 


Rep. 

Random  Measure 

h 

Asymptotic  Error  Bound 

Complexity 

IL 

LomP(7,  A-1) 

Poi 

Ne~KW o({37Amax(A,e)}_1) 

2  K 

B 

BP(7,A>1,0) 

PP(7,A,0) 

BPP(7,A,0) 

DP  (7) 

Be 

Poi 

OBe 

h&y 

3  K 

T 

— 

— 

See  Eq.  (12) 

3  K 

R 

BP (7,  A,  0) 
PP(7,A,0) 

bpp(7,  a,  0) 

Be 

Poi 

OBe 

r  e-^a^A}-1)  d  =  0 
n  I  K~dd~d)  d  >  0  (rp) 

l  K~xld  d  >  0  (BP,  BPP) 

3  K 

DB 

BP(7,A>  1,0) 
PP(7,A,0) 
BPP(7,A  >  1,0) 

DP  (7) 

Be 

Poi 

OBe 

-(A)' 

(3cc  +  i)  K 

SB 

BP  (7,  A,  d) 
PP(7,A  ,d) 
BPP(7j  A,  d) 

NrP(7,A,d) 

Be 

Poi 

OBe 

NIC1-1 

K 

f  K~L  log  K  d  =  0, 7A  =  1 

N  1  K~  min(i,7A)  d  =  o;  7a  ^  l 

[  A'd 1  d>  0 

PL 

BP  (7,  A,  d) 
PP(7,A  ,d) 
BPP(7,A  >  1,  d) 

Be 

Poi 

OBe 

f  (aTt)^  4  =  0 (BP, PP) 

NS  2~k  d  =  0  (BPP) 

[  K1-1^  d>  0 

1 K 2 

21V 

of  the  Bondesson  and  decoupled  Bondesson  representations  are  not  satisfied.  Here  the  rejection 
representation  is  often  the  best  choice  due  to  its  simplicity  and  competitive  performance  with  the 
inverse-Levy  representation.  However,  one  must  take  care  to  check  its  efficiency  beforehand  using 
Proposition  4.2.2  given  a  particular  choice  of  /i(d$).  For  example,  the  choice  of  /i(d$)  oc  6*“1_d dd 
in  the  present  work  makes  the  rejection  representation  very  inefficient  when  rf  C  1  for  both  the 
gamma-Poisson  (Fig.  3)  and  beta  prime-odds  Bernoulli  (Fig.  2)  processes,  but  efficient  for  the  beta- 
Bemoulli  process  (Fig.  1).  If  no  ii(dO  ')  yields  reasonable  results,  the  power-law  representation  is  a 
good  choice  for  d  <C  1  as  its  truncation  bound  approaches  the  exponential  decay  of  the  light-tailed 
process.  For  larger  d  >  0  the  size-biased  representation  is  a  good  alternative. 

Based  on  the  results  in  Fig.  3,  it  appears  that  there  is  no  single  dominant  representation  for  all 
situations  (provided  the  inverse-Fevy  representation  is  intractable,  as  it  most  often  is).  However, 
as  a  guideline,  the  rejection  and  Bondesson  representations  tend  to  be  good  choices  for  light-tailed 
processes,  while  the  rejection,  size-biased,  and  power-law  representations  are  good  choices  for 
heavy-tailed  processes. 
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Conclusions 


We  have  investigated  sequential  representations,  truncation  error  bounds,  and  simulation  algo¬ 
rithms  for  (normalized)  completely  random  measures.  In  past  work,  the  development  and  analysis 
of  these  tools  has  occurred  only  on  an  ad  hoc  basis.  The  results  in  the  present  paper,  in  con¬ 
trast,  provide  a  comprehensive  characterization  and  analysis  of  the  different  types  of  sequential 
(N)CRM  representations  available  to  the  practitioner.  However,  there  are  a  number  of  remaining 
open  questions  and  limitations. 

First,  this  work  does  not  consider  the  influence  of  observed  d  ata:  all  analyses  assume  ana 
priori  perspective,  as  truncation  is  typically  performed  before  data  are  incorporated  via  posterior 
inference  (e.g.  in  variational  inference  for  the  DP  mixture  [Blei  and  Jordan,  2006]  and  BP  latent 
feature  model  [Doshi- Velez  et  al.,  2009]).  However,  analysis  of  a  posteriori  truncation  has  been 
studied  in  past  work  as  well  [Ishwaran  and  James,  2001,  Gelfand  and  Kottas,  2002,  Ishwaran  and 
Zarepour,  2002].  In  the  language  of  CRMs,  observations  introduce  a  fixed-location  component  in 
the  posterior  process,  while  the  unobserved  traits  are  drawn  from  the  (possibly  normalized)  ordi¬ 
nary  component  of  a  CRM  [Ishwaran  and  Zarepour,  2002,  Broderick  et  al.,  2017].  We  anticipate 
that  this  property  makes  observations  reasonably  simple  to  include:  the  truncation  tools  provided 
in  the  present  paper  can  be  used  directly  on  the  unobserved  ordinary  component,  while  the  fixed- 
location  component  may  be  treated  exactly. 

In  addition,  there  are  important  open  questions  regarding  the  sequential  representations  devel¬ 
oped  in  this  work.  It  is  unknown  whether  generalized  versions  of  the  Bondesson  and  decoupled 
Bondesson  representations  can  be  developed  for  larger  classes  of  rate  measures.  The  power-law 
representation  does  provide  a  partial  answer  in  the  decoupled  Bondesson  case.  Regarding  size- 
biased  representations,  one  might  expect  that  the  use  of  conjugate  exponential  family  CRMs  [Brod¬ 
erick  et  al.,  2017]  would  yield  a  closed-form  expression  for  the  truncation  bound.  In  all  of  the  cases 
provided  in  this  paper,  this  was  indeed  the  case;  the  integrals  were  evaluated  exactly  and  a  closed- 
form  expression  was  found.  However,  we  were  unable  to  identify  a  general  expression  applicable 
to  all  conjugate  exponential  family  CRMs.  Based  on  the  examples  provided,  we  conjecture  that 
such  an  expression  exists.  Finally,  fundamental  connections  between  some  of  the  representations 
were  left  largely  unexplored  in  this  work.  This  is  an  open  area  of  research,  although  progress  has 
been  made  by  connecting  decoupled  Bondesson  and  size-biased  representations  for  (hierarchies 
of)  generalized  beta  processes  [Roy,  2014,  Sec.  6.4]. 

A  final  remark  is  that  one  of  the  primary  u  ses  of  sequential  representations  in  past  work  has 
been  in  the  development  of  posterior  inference  procedures  [Paisley  et  al.,  2010,  Blei  and  Jordan, 
2006,  Doshi- Velez  et  al.,  2009].  The  present  work  provides  no  guidance  on  which  truncated  rep¬ 
resentations  are  best  paired  with  which  inference  methods.  We  leave  this  as  an  open  direction  for 
future  research,  which  will  require  both  theoretical  and  empirical  investigation. 
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Technical  Appendices 


A.l  Technical  lemmas 

Lemma  A.1.1.  If  Yk  is  a  uniformly  bounded,  non-negative  sequence  of  random  variables  such  that 

lini^oc  E[Yk\  =  0,  then  Yk  -4  0. 

Proof  Without  loss  of  generality  we  assume  that  Yk  G  [0, 1]  a.s.  Then  for  all  e,  5  >  0,  by  hypoth¬ 
esis  there  exists  k'  such  that  for  all  k  >  k' ,  E[Yfc]  <  eS.  It  then  follows  from  Markov’s  inequality 
that  for  all  k  >  k' ,  P (Yk  >  e)  <  S.  □ 

Lemma  A.1.2.  If  p  is  a  non-atomic  measure  on  M':/,  then  for  any  x  G  MI1  and  5  >  0,  there  exists 
eX)s  >  0  such  that  q({y  G  |  ||a;  —  r/|| 2  <  eX)s})  <  S. 

Proof.  Without  loss  of  generality  let  x  =  0.  Suppose  the  implication  does  not  hold.  Then  there 
exists  <5  >  0  such  that  for  all  e  >  0,  p(Be)  >  5,  where  Be  :=  {y  G  |  ||r/||2  <  e}.  Let  en  be  a 
sequence  such  that  en  — >  0  as  n  — >  oo.  Then  by  continuity  /r({0})  =  limn_>0O  p(Be? J  >  5,  hence 
p.  is  a  atomic,  which  is  a  contradiction.  □ 

Lemma  A.1.3.  If  aid 6),  an  absolutely  continuous  cr-finite  measure  on  M+,  and  continuous  0  : 
M+  — >  [0, 1]  satisfy 

z/(M+)  =  oo,  /  min(l,  9)u(d9)  <  oo,  and  f  f(9)u(d9)  <  oo, 


then 


lim  <f>(9)  =  0(0)  =  0. 

Proof  f  min(l,  9)u(d9)  <  oo  implies  that  f*  9u(d9)  <  oo  and  u(dd)  <  oo.  Butz/(M+)  =  oo, 
so  Jq  u(d9)  =  oo.  In  fact,  for  all  e  >  0,  z/([0,e])  =  oo  since  otherwise  u(d9)  =  oo  =>• 
f*  9u(d9)  =  oo,  a  contradiction.  Since  0  is  continuous  and  has  bounded  range,  lim^o  0(0)  =  c 
exists  and  is  finite.  Assume  c  >  0,  so  3e  >  0  such  that  Me'  <  e,  |0(e')  —  c\  <  c/2,  and  in  particular 
0(e')  >  c/2.  Thus,  for  any  9’  <  e,  f°  0(0)z/(d0)  >  c/2  v(d9)  =  oo,  a  contradiction.  Thus, 

c  =  0.  □ 

Lemma  A.1.4.  Assume  fix)  is  a  twice  continuously  differentiable  function  with  the  following 
properties: 
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1.  f"{x)/f'(x)  — >  0  as  x  — >  oo 

2.  for  all  5  >  0  there  exists  If  >  0  such  that  for  any  increasing  sequence  {xn)f=\, 

<t>"{y) 


lim  sup  sup 


=  Bx  <  oo. 


n—>oo  y€[xn,xn+S]  4*  (Xn) 

Then  for  any  constant  c  >  0  and  any  increasing  sequence  ( xn)f=1 , 

f(xn  +  c)  -  4>(xn )  ~  cf{xn )  for  n  -)•  oo. 

Proof  A  second-order  Taylor  expansion  of  o{xn  +  c)  about  xn  yields 

c2 

f(xn  +  c)  -  4>(xn)  =  cf\xn)  +  — 0"«), 

where  x*n  G  [xn,  xn  +  c\.  Our  assumptions  on  f  ensure  that, 

lim  <  iim  Mff.  =  0 


f'(Xn) 


n— o  0'  Xr 


and  hence 


Hm  HXn  +  c)  -  0(Xn)  =  lim  1  +  cf"(x*n)  =  ^ 


c4>'(xn) 


2  f[xr 


Lemma  A.1.5  (Gautschi  [1959]). 


(l  +  x)*-1^  ^X  +  d)  <xd~ 1 


T(a;  +  1) 


0  <  d  <  1,  x  >  1, 


and  thus  for  0  <  d  <  1, 


r(x  +  d) 


X 


d- 1 


T(a;  +  1) 

Lemma  A.1.6.  For  a  >  0  and  x  >  —  1, 


x  — y  oo. 


M 

E 


r (a  +  m  +  x)_  /  rb  (X"mi"  -  rj!wr1)  x>~1 

x  =  —  1 


r(a  +  ?n)  f(a  +  M)—f(a) 

where  %>{■)  is  the  digamma  function. 

Proof  When  M  =  1  and  x  >  —  1,  analyzing  the  right  hand  side  yields 

T(a  +  M  +  x  +  l)  T(o;  +  a:  +  l)  T(a  +  x  +  2)  T(a  +  a;  +  l) 


T(a  +  M ) 


r(a) 


T(a  +  1)  T(a) 

T(a  +  x  +  l)  f  a  +  x  +  1 


=  (x  +  1) 


T(a)  \  a 
T(a  +  x  +  1) 


-  1 


T(cr  +  1) 


□ 
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By  induction,  supposing  that  the  result  is  true  for  M  —  1  >  1  and  x  >  —  1, 

A  r(«  +  m  +  i)  r(a  +  m  +  x)  i  T(a  +  M  +  a;) 

(  rv  4-  m  \  <  ^ 


— '  r(a  +  m) 

2—1  V  7 


772=  1 


T(a  +  m) 


T(a  +  M) 


1  /r(o;  +  M  +  a;)  r(o:  +  x  +  1)^  r(o;  +  M  +  x) 


1  +  a;  \r(a  +  M  —  1)  r(a)  /  T(a  +  M) 
r(a  +  M  +  x)  a  +  M  +  x  r(a  +  a;  +  l) 

T(a  —  M  —  1)  (1  —  x)(a  +  M  —  1)  (1  +  x)r(a) 

1  fY{a  +  M  +  a;  +  1)  r(a  +  x  +  l) 


1  +  x  \  T(a  +  M)  r(a) 

This  demonstrates  the  desired  result  for  x  >  —1.  Next,  when  x  =  —1,  we  have  that 


M 

E 


T(a  +  m  —  1) 


M 

E 


— '  T(a  +  m)  ^  a  +  m  —  1 

772=1  v  7  772=1 

We  proceed  by  induction  once  again.  For  M  —  1,  using  the  recurrence  relation  ip(x  +  1)  = 
ip(x)  +  x~l  [Abramowitz  and  Stegun,  1964,  Chapter  6],  the  right  hand  side  evaluates  to 

ip  (a  +  1)  —  ip(a)  =  ip  (a)  +  a~l  —  ip  (a)  =  oTl . 

Supposing  that  the  result  is  true  for  M  —  1  >  1  and  x  —  —  1, 

M  M—l  1  1 

y — - —  =  y 

^  a  +  m  —  1  ^ 


772=1 


a  +  m  —  1  a  +  M  —  1 


772=1 

=  ip  (a  +  M  —  1)  —  ip  (a)  + 
=  ip  (a  +  M)  —  ip  (a), 

demonstrating  the  result  for  x  —  —  1. 

Lemma  A.1.7.  For  a  >  0,  d  G  M,  and  xn  — *  oo, 

d  r(a  +  xn  +  <i) 


a  +  M 


□ 


d— 1 


Proof.  We  have 


dxn  T(a  +  a;n) 


d  r(a  +  a:  +  d)  r(a  +  x  +  d) 


dx 


(ip  (a  +  x  +  d)  —  ip  (a  +  x)), 


dx  T(a  +  x)  T(a  +  x) 

where  ip  is  the  digamma  function.  Using  Lemma  A.  1.4  and  the  asymptotic  expansion  of 
ip1  [Abramowitz  and  Stegun,  1964,  Chapter  6],  we  obtain 

d 


ip  (a  +  xn  +  d)  —  ip  (a  +  xn)  ~  dip' (a  +  xn) 


xn  +  a 


dx 


-l 


Since 


r(a  +  xn  +  d)  d  „d 

— ■ - r - (a  +  xn)  ~  xn, 

r(a  +  xn) 

using  Lemma  A.  1 .9(2)  with  the  previous  two  displays  yields  the  result. 


□ 
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Lemma  A.1.8.  For  0  <  d  <  1, 


i)  e-^e-^de 


T(-d)  ((X  +  t)d-  Xd)  0  <  d  <  1 
l°g  (^)  d  =  0 


Proof.  By  integration  by  parts  and  the  standard  gamma  integral, 


(e~te  -  1)  9~1~de~xed9 


/  [(X  +  t)e~^e  -Xe~xd] 

Jo 

T(-d)  ((X  +  t)d-Xd)  . 


d  9 


Taking  the  limit  as  d  — >  0  via  L’Hospital’s  rule  yields 


limT(-d)  ((A  +  t)d-  Xd) 


□ 


Lemma  A.1.9  (Standard  asymptotic  equivalence  properties). 

1.  If  an  ~  bn  and  bn  ~  cn,  then  an  ~  cn. 

2.  If  an  ~  bn  and  cn  ~  dn,  then  ancn  ~  bndn. 

3.  If  an  ~  bn,  cn  ~  dn  and  ancn  >  0,  then  an  +  cn  ~  bn  +  dn. 


A.2  Proofs  of  sequential  representation  results 

A.2.1  Correctness  of  B-Rep,  DB-Rep,  and  PL-Rep 

Proof  of  Theorem  4.2.1.  First,  we  show  that  gu(v)  is  a  density.  Since  vv{v)  is  nondecreasing, 
exists  almost  everywhere,  ^[w(v)]  <  0  and  hence  gv(v)  >  0.  Furthermore, 

/OO  POO 

gu(v)dv  =  -c~l  J  —  [vu(v)]dv  =  -c~lvu(v) 

where  the  final  equality  follows  from  the  assumed  behavior  of  vv{v)  at  0  and  oo.  Since  for  a 
partition  Ai, . . . ,  An,  the  random  variables  @(Ai), . . . ,  0(An)  are  independent,  it  suffices  to  show 
that  for  any  measurable  set  A  (with  complement  A),  the  random  variable  0(A)  has  the  correct 
characteristic  function.  Define  the  family  of  random  measures 

OO 

Qt  =  Y.VKe~{W)/Cu5^  *  >°, 

k= 1 

so  0O  =  0.  Conditioning  on  rn, 

(04(A)  |  T1=  u)  =  G  A]  +  0t+u(A), 
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and  note  that  the  two  terms  on  the  left  hand  side  are  independent.  We  can  thus  write  the  character¬ 
istic  function  of  @f(VL)  as 


<p(£,t,A)  :=E[e*e‘(A>] 

=  E[E[ei?et(A)  |  r\  =  «]] 

=  E[E[ei?yie'(“+t)/c"1[v,ieA]ei?0t+u(A)  |  Y]  =  u]\ 

=  E[E l(G(A)e^Vie~iu+t)/cv  +  G(A))<p(Z,  t  +  u,  A)  \  T1  =  u]] 

POO  POO 

=  G(A)  /  /  e^ve  (u+t)/c'' cp(^,t  +  u,  A)gv{v)e~u  dudv 

Jo  Jo 

POO 

+  G(VL)  /  (p(^,t  +  u,A)e~udu, 

Jo 

where  A  is  the  complement  of  A.  Multiplying  both  sides  by  e_t  and  making  the  change  of  variable 
w  =  u  +  t  yields 


f*  OO  POO 


e  tip(£,t,A)  =  G(A)  /  /  elive  y{£,w,A)gv(v)e  Wdudw 


p-w/cu 

Jt  Jo 

/< OO 

ip(£,  w,  A)e~w  din 


=  G(A )  (PgJZe  w/c-')ip{£,w,A)e  Wdw 


+  G(A)  /  <p(£,w,A)e  Wdw, 


where  <pgv(a )  :=  /0°°  emvgu(v)  du  is  the  characteristic  function  of  a  random  variable  with  density 
gv.  Differentiating  both  sides  with  respect  to  t  and  rearranging  yields 


MkLAl  =  -  G(A)iptL1  ,t,A)  -  (1  -  G(A))<p((,t,A) 

=  V((,t,A)G(A)(  1-^Ke-*^)), 


so  we  conclude  that 


tpiCit,  A)  —  exp  (  -G(A)  /  (1  -<Pgv(te  u,Cv))d u 


Using  integration  by  parts  and  the  definition  of  gv,  rewrite 


<P9v  W 


-i 


=  ~C^JQ  ^-[vu{v)]etav  dv 


=  -c~1vu(v)eiav 


OO 


v=0 


+  C 


-1 


iavv (v)emv  du 


=  1+  /  —  v(v)eiav  dv, 

Jo  cv 
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where  the  final  equality  follows  from  the  assumed  behavior  of  vv{v)  at  0  and  oo.  Combining  the 
previous  two  displays  and  setting  t  =  0  concludes  the  proof: 


¥>(£,0,  A)  =  exp 
=  exp 

=  exp 


-utcvjtve  u/cV(u)dndM 
Cv 

d 


—  u/cv 


_J&e 

du  L 

1  )v(v)  dr;  )  . 


u(v)  du  dr; 


Proof  of  Theorem  4.2.3.  It  was  already  shown  that  gu[y)  is  a  density.  Let  Q'k  =  YJh=i  so 

0  =  YJh=i  Each  Q'k  is  a  CRM  with  rate  measure  ^-u'k(d9),  where  ifk{d9)  is  the  law  of  6ki. 


□ 


>  SO 


z — jk=  1  re  re 

Using  the  product  distribution  formula  we  have 


v'M°)  =  I  J[T\(-loSw)k  lw e  2gu(0/w)dw dO. 

Jo  E(fc) 


(27) 


Let  Gv{v)  =  fQv  gu  (x)  dx  be  the  cdf  derived  from  gv.  From  the  preceding  arguments,  conclude  that 
the  rate  measure  of  0  is 


u\d9)  :=  j  ^v'k(dd) 

**  k=  1 

fl  00  tk- 1 


=  CL 


pi  ^  tk- 1 

/  y^F77T(~i°gM;)fc~i'u;?~2^(0/u;)dwd0 

'  0  k= 1  1  > 


—  cv  I  £vj  2gu(0/w)dwd0 

Jo 

=  cv  J  £9~1-^[-Gu{9/w)]dwd9 
=  -cv9~lGu(9/w)  1  d  9 

i/f—0 

=  c,A-1(l-GJ9))d9. 


The  cdf  can  be  rewritten  as 


1  -  G„{9) 


1  +  cv  1xv(x) 


e 

:r=0 


c~19o{9). 


Combining  the  previous  two  displays,  conclude  that  z/(d 9)  =  u(9)  d 9.  □ 

Proof  of  Theorem  4.2.5.  Since  the  power-law  representation  in  Eq.  (5)  for  the  case  when  Vk  i  =  1 
almost  surely  was  previously  shown  to  be  BP(y,  a,  cl)  [Broderick  et  al.,  2012],  we  simply  apply  the 
stochastic  mapping  result  in  Lemma  4.2.4  with  k(9,  du)  =  9~1gu(u9~1)du,  where  9~1gu(u9~1)  is 
the  density  of  U  =  V  9  \  9  under  U~  gu .  □ 
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A.2.2  Proof  of  the  expected  number  rejections  of  the  R-Rep 


Proof  of  Proposition  4.2.2.  We  have 


E 


£  1(9*  =  0) 


,k= 1 


=  E 


=  E 


£ 1  - Ut 


,k= 1 
oo 


1  - 


where  the  equalities  follow  from  the  definition  of  9k,  integrating  out  Uk,  and  applying  Campbell’s 
theorem.  □ 


A.2.3  Power-law  behavior  of  the  PL- Rep 

We  now  formalize  the  sense  in  which  power-law  representations  do  in  fact  produce  power-law 
behavior.  Let  Zn  |  0  ~  LP(Poiss,  0)  and  yk  :  =  ,  1  [znk  >  1].  We  analyze  the  number  of 

non-zero  features  after  N  observations, 

OO 

kn  '■=  y  i[ Vk  >  i], 

k=  1 

and  the  number  of  features  appearing  j  >  1  times  after  N  observations, 

OO 

KN,j  ■■=yt[yk=]}- 

k= 1 

In  their  power  law  analysis  of  the  beta  process,  Broderick  et  al.  [2012]  use  a  Bernoulli  likeli¬ 
hood  process.  However,  the  Bernoulli  process  is  only  applicable  if  6k  6  [0, 1],  whereas  in  general 
9k  E  M+.  Replacing  the  Bernoulli  process  with  a  Poisson  likelihood  process  is  a  natural  choice 
since  1  [znk  >  1]  ~  Bern(l  —  e~9k),  and  asymptotically  1  —  e~°k  ~  9k  a.s.  for  k  — »  oo  since 
linifc^oo  9k  =  0  a.s.  Thus,  the  Bernoulli  and  Poisson  likelihood  processes  behave  the  same  asymp¬ 
totically,  which  is  what  is  relevant  to  our  asymptotic  analysis.  We  are  therefore  able  to  show  that 
all  CRMs  with  power-law  representations,  not  just  the  beta  process,  have  what  Broderick  et  al. 
[2012]  call  Types  I  and  II  power  law  behavior.  Our  only  condition  is  that  the  tails  of  g  are  not  too 
heavy. 

Theorem  A.2.1.  Assume  that  g  is  a  continuous  density  such  that  for  some  e  >  0, 

g{x)  =  0(x-l~d -£).  (28) 

Then  for  0  <(—  PL-Rep(7,  a,  d,  g)  with  d  >  0,  there  exists  a  constant  C  depending  on  7,  a,  d,  and 
g  such  that,  almost  surely, 

Kn  ~  T(1  -  d)CNd ,  N  00 

Knj  ~  N  ->  00  (j  >  1). 
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In  order  to  prove  Theorem  A.2.1,  we  require  a  number  of  additional  definitions  and  lemmas. 
Our  approach  follows  that  in  Broderick  et  al.  [2012],  which  the  reader  is  encouraged  to  consult 
for  more  details  and  further  discussion  of  power  law  behavior  of  CRMs.  Throughout  this  section, 
0  A-  PL-Rep(7,  a,  d,  g )  with  d  >  0.  By  Lemma  4.2.4,  0  ~  CRM(z/),  where 

u{d9)  :=  j  g(6 /u)u~1uBp(du)  d6 

and  r,Bp(d 9)  is  the  rate  measure  for  BP(7,  a ,  d).  Let  IR  be  a  homogeneous  Poisson  point  process 
on  R+  with  rate  R  and  define 


K(t)  :=^i[|njtn[o,t]|>o] 

k= 1 
oo 

Kj(t)  :=^i[|nfcn[o,t]|=j], 

k= i 

Furthermore,  for  JVeN,  let 


&n  '■=  E[iTjv] 

and  for  t  >  0,  let 

4(f)  :=  E[A'(f)] 


and  :=  [K[A'Vj(  (j  >  1) 

and  4,(f)  :=  E[A,(f)]  (j  >  1). 


If  follows  from  Campbell’s  Theorem  [Kingman,  1993]  that 


$(t)  =  E 


=  E 


.  k 


D-tek' 


D-N6k^ 


=  (l-e~w)u(d  0) 


=  $(N) 


$&)  =  iNm 


9k)j c-tek 


=  tj[  /(i-  e-eye-teu(do ) 


E 


-9k\j  -(N-j)0k 


(1  -  e-e)je-{N-j)e 


u(d9). 


The  first  lemma  characterizes  the  power  law  behavior  of  $(t)  and  d>:)  (t).  A  slowly  varying 
function  ^  satisfies  I(ax)/£(x)  — >  1  as  x  — >  oo  for  all  a  >  0. 

Lemma  A.2.2  (Broderick  et  al.  [2012],  Proposition  6.1).  If  for  some  d  e  (0, 1),  C  >  0,  and  slowly 
varying  function  i. 


then 


u[0,x\  :=  /  9u(d9) 


$(t)  ~  r(i  -  d)ctd, 
4,(f)  ~  ALzEcp, 


d 


1  —  d 


Ci{ l/x)x1  ,  x  -A  0, 

t  —y  oo 

t->o o  (j  >  1). 


(29) 
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Transferring  the  power  law  behavior  from  <f>(t)  to  (I>  v  is  trivial  since  $(iV)  =  &N.  The  next 
lemma  justifies  transferring  the  power  law  behavior  from  T?(f)  to  <3>  v . 

Lemma  A.2.3  (Broderick  et  al.  [2012],  Lemmas  6.2  and  6.3).  Ifu  satisfies  Eq.  (1),  then 

K(t)foca.s.,  <f>(f)  t  oo,  <f>(f)/fj,0. 


Furthermore, 


l*JVJ  -  Wl  <  f  max{$,(JV),  «>j+2(]V)}  -(•  0, 

The  final  lemma  confirms  that  the  asymptotic  behaviors  of  Iifir  and  is  almost  surely  the 
same  as  the  expectations  of  K n  and  Knj. 

Lemma  A.2.4.  Assume  v  satisfies  Eq.  (1)  and  that  for  some  d  e  (0, 1),  C  >  0,  Cj  >  0,  and  slowly 
varying  functions  £,  £',  $(£)  ~  C£(t)td  and  &j(t)  ~  Cj£{t)td.  Then  for  N  —y  oo,  almost  surely 

Kn  ~  and  ^  Cvy,  ~  ^ 

i<j  i<j 


Proof  of  Theorem  A.2.1.  Combining  the  three  lemmas,  the  result  follows  as  soon  as  we  show  that 
u(d9)  satisfies  Eq.  (29).  C  will  be  a  constant  that  may  change  from  line  to  line.  We  begin  by 
rewriting  v(d6)  using  the  change  of  variable  w  =  9{u~l  —  1): 


i/(d0)  =  C  I  g{e/u)u-'2~d(  1  -  u)a+d~l  d u  d£ 

Jo 

,a-\-d—  1 


=  ce 


-1  -d 


g(w  +  9) 


w 


(w  +  oy-1 


din  d  9. 


Since  g(x)  is  integrable  and  continuous,  for  x  €  [0, 1],  it  is  upper-bounded  by  the  non-integrable 
function  CGx~l  for  some  C0  >  0.  Combining  this  upper  bound  with  Eq.  (28)  yields  g(x)  < 
4>(x)  ■—  C0x~lt[x  <  1]  +  Crx^1~d~et[x  >  1]  for  some  Ci  >  0,  so 


g(w  +  9) 


w 


OL-\-d—  1 


(w  +  6*)“_1 


<  f{w)wd. 


Since  4>{w)wd  is  integrable,  by  dominated  convergence  the  limit 


L 


lim 

0->O 


g(w  +  9) 


w 


OL-\-d—  1 


(w  +  9)01-1 


d  w 


exists  and  is  finite.  Moreover,  since  g(x)  is  a  continuous  density,  there  exists  M  >  0  and  0  <  a  < 
h  <  oo  such  that  g(x)  >  M  for  all  x  G  [a,  b\.  Hence,  for  9  <  a. 


g(w  +  9) 


w 


a+d—l 


(w  +  6»)“-1 


d  w  >  M 


w 


OL+d—1 


(w  +  9)01-1 


die  >  0, 
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so  L  >  0.  Thus, 


ip (9)  ■■=  e  I  g{e/u)u~2-d( i  -  u)a+d~l  d u  ce~d, 


OL-\-d—  1 


9^  0 


and  hence  for  S  >  0  and  6  sufficiently  small,  \f>(9)  —  CO  “|  <  6.  Thus,  for  x  sufficiently  small, 


9)d0<  I  C0~dd0+  I  -C6-d\d9 


-d  | 


JO 

Cxl~d  r 
< - -  +  Sx 


1  —  d 
Cxx-d 
l-d  ’ 


x  — >  0, 


which  shows  that  Eq.  (29)  holds. 


□ 


A.3  Proofs  of  CRM  truncation  bounds 


A.3.1  Protobound 

Lemma  A.3.1  (Protobound).  Let  0  and  O'  be  two  discrete  random  measures.  Let  X i :  ;y  be  a 
collection  of  random  measures  generated  i.i.d.  from  O  with  supp(A7"n)  C  supp(0),  and  let  Yl:N 
be  a  collection  of  random  variables  where  Yn  is  generated  from  Xn  via  Yn  \  Xn  /(■  \Xn). 
Define  Z i,n  and  Wv.n  analogously  for  0'.  Finally,  define  Q  :=  1  [supp(Ad:Ar)  C  supp(0')].  If 
(Ai:7v|0,  Q  —  1)  =  {Z1:N\&,  0)  almost  surely  under  the  joint  distribution  of  (-) .  0',  then 

f\\PY  —  Pw\\l  <  1  -P(<5  =  1), 
where  py,Pw  are  the  marginal  densities  ofYyN  and  W\,n. 

Proof  of  Lemma  4.3.1.  This  is  the  direct  application  of  Lemma  A.3.1  to  CRMs,  where  0  ~ 
CRM(i/),  and  0'  is  a  truncation  0'  =  QK.  The  technical  condition  is  satisfied  because  the  weights 
in  A1;yv  are  sampled  independently  for  each  atom  in  O.  □ 

Proof  of  Lemma  4.4.1.  This  is  the  direct  application  of  Lemma  A.3.1  to  NCRMs,  where  0  is  the 
normalization  of  a  CRM  with  distribution  CRM(z/),  and  0'  is  the  normalization  of  its  truncation. 
The  technical  condition  is  satisfied  because  the  conditioning  on  X1:N  C  supp(0')  is  equivalent  to 
normalization  of  0'.  □ 


Proof  of  Lemma  A.3.1.  We  begin  by  expanding  the  1-norm  and  conditioning  on  both  0  and  0' 
(denoted  by  conditioning  on  0  :=  (0,  0')  for  brevity): 


II Vy  ~  Pw Hi  — 


E 


E 


N 


n=  1 


-E 


i[f(yn\zn) 

N 

Hf(yn\zn)\e 


N 


Y[f(yn\Xn 


n=  1 


d  y 


E 


n=l 


E 


N 


R/(!/n|v„)|e 


n= 1 


dy. 
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Then  conditioning  on  Q, 


E 


N 


n/W**)ie 


n=  1 


=  E 


E 


N 


Y[f(yn\Xn)\e,Q 


n= 1 


|0 


=  1|0)E 


N 


J[f{yn\zn)\Q 


n= 1 


+  p(g  =  o|0)e 


N 


Uf(yn\Xn)\Q,Q  =  0 


n= 1 


where  the  first  term  arises  from  the  fact  that  for  any  function  q i>, 


E 


=  E 


(f>(X1:N)\0,Q  =  1 


because  Xi:N\Q,Q  =  1  is  equal  in  distribution  to  Z1:N\0  a.s.  by  assumption.  Substituting  this 
back  in  above, 


\\PY-Pw\\l 

=  /  |E 
<  [  E 


(30) 


<  /  E 


=  0 1 ©)  (  E 
P(Q  =  O|0) 

P(Q  =  O|0)  I  E 


N 


E 


Hf(yn\zn)\e 

n= 1 
J 

Hf(yn\Zn)\Q 

n=l 

N 

HfMzn)\e 


-E 


N 


n—  1 


E 


+  E 


Uf{yn\xn)\e,Q  =  o 

n= 1 
J 

Y[f(yn\Xn)\e,Q  =  0 

n=  1 
N 

Y[f(yn\xn)\e,Q  =  o 


d  y 


d  y 


n=  1 


d  y, 


and  finally  by  Fubini’s  Theorem, 

II Py  ~Pw ||i 


VI 

p(g  =  0  0)  (e 

'  r  N 

/  Y[f(yn\zn)dy\e 

•*  n=  1 

=  E 

p(g  =  0  0)  (e 

1|0 

+  E 

i  i 

p> 

+  E 


N 


13  f(Vn\Xn)dy\Q,Q  =  0 


n=  1 


=  2P(g  =  0)  =  2(l-P(Q  =  l)). 


□ 

Proof  of  Propositions  4.3.2  and  4.4.2.  The  same  proof  applies  to  both  results.  Since  G  is  non- 
atomic  and  T  is  Borel,  there  exists  a  measurable  mapping  T  '■  — >  R  such  that  the  random 

variable  T(ip),  G,  is  non-atomic.  For  an  atomic  measure  g  =  Ylk=x  wk^k  on  T.  define 

Ks 

S(e,T ) 

k=l 

Let-?/)'  ~  G,  Tx  :=  #{fc  |  xk  >  0  Azk  =  0}  <  oo,  and  Tz  #{/c  \  zk  >  0Axk  —  0}.  Conditional 
on  Q  =  0,  there  exists  n  e  [N]  such  that  TXn  >  1  since  Xn  has  at  least  one  atom  that  Zn  lacks. 
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Using  these  definitions  and  facts,  observe  that 


E[S(Xn,T)-S(Zn,T)\Q  =  0\ 

~Txn  TZn 

=  e  e  rwo  -  E 'WD  10=0 

3= 1 


=  E 


i=l 

oo 

E  nTx„  =  tx,TZn  =  tz\Q  =  0) 

.  *X  =  Mz=0 

t-X  tZ 

E  rW)  -  E  TW  i Tx-  =  f*.  Tz»  =  Q = 0 

*=i  j= i 


x  E 


Since  Txn  >  1,  V (f.v,  tz)  ■—  Yll= i  (uQ  —  Sj=i  (V’j)  is  a  finite  sum  of  one  or  more  non-atomic 
random  variables,  so  V (tx,  tz )  is  itself  non-atomic.  It  then  follows  that  A  :=  S(X,  T)  —  S^Z,  T) 
is  non-atomic.  Thus,  by  Lemma  A.  1.2,  for  any  5,  there  exists  >  0  such  that  P(|A|  <  es)  <  5. 
Define  the  family  of  likelihoods  /$(•  |  g)  =  \Jn\f[S(g,T)—es/2,S(g,T)  +  es/2].  Let  /  =  f$.  Then, 
conditioned  on  Q  =  0,  with  probability  at  least  1  —  5,  f(y  \  Xn)  >  0  =>■  f(y\Zn )  =  0  and 
f(y  |  Zn)  >  0  =>•  f(y  |  Xn)  =  0,  which  implies  that  both  inequalities  in  Eq.  (30)  are  equalities. 
Hence,  we  conclude  that  \\pY  —  Pw\\i  >  2(1  —  5)P(Q  =  0).  □ 


A.3.2  Series  representation  truncation 

Recall  from  Section  4.3.1  that  a  series  representation  generally  has  the  form 


0  =  0kS^k  6k  =  r(Vk ,  Tfc)  Ufc  ~  g, 

k= 1 


where  Tk  =  Yll=i  Ee>  Et  rs./  Exp(l),  are  the  jumps  in  a  unit-rate  homogeneous  Poisson  point  pro¬ 
cess,  r  :  M+  x  M+  — »  M+  is  a  measurable  function,  g  is  a  distribution  on  M+,  and  lim,^^  t(v,  u )  = 
0  for  g-almost  every  v.  Note  that  by  Lemma  A. 1.3  7f(0)  =  0,  where  tt(x)  :=  1  —  This  fact 
will  repeatedly  prove  useful  for  the  proofs  in  this  section. 

The  proof  of  Theorem  4.3.3  is  based  on  the  following  lemma. 

Lemma  A.3.2.  Under  the  same  hypotheses  as  Theorem  4.3.3, 


(supp(Xi:iV)  c  supp(0x)) 


=  E 


exp  <  — 


1  - 


h(t(v,  u  +  Gk))n g(dv)  )  d u 


(31) 


Proof.  Let 


p(t,K) 


E 


n  n  (r(vk,rk+t)) 


N 


,k=K+ 1 
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so  p(Q,  K )  =  P(supp(X1:Ar)  C  supp(0k)).  We  use  the  proof  strategy  from  Banjevic  et  al.  [2002] 
and  induction  in  K.  For  K  =  0, 


p{t ,  0)  =  E 


E 


7r(r(]/i,M  +  f))JV  J  J 


7T  T 


k= 2 


Vk,  M  +  f 

l<j<fe 


Ti  =  u 

/ 

7T  (r(u,  u  +  t))N  p(u  +  t,  0)e  “g(dr?)dw 


since  the  14  are  i.i.d.  Multiplying  both  sides  by  e  *  and  making  the  change  of  variable  w  =  u  +  t 
yields 


7r  (t(w,  w))n  p{w,  0)e  ^  g(du)  din. 


Differentiating  both  sides  with  respect  to  t  and  rearranging  yields 


°P^  0)  =  p(* ,  °)  (l  -  J  K  {t{v,  t))N  .9(du))  • 


(32) 


Since  lim^^  t(v,  u)  =  0  and  7r(0)  =  1  by  Lemma  A. 1.3,  we  can  solve  Eq.  (32)  and  conclude 
that 


P(t,  0)  =  exp  <|  -  J  |  1  - 

r»  OO 

'l- 

'o 


7T  (r(u,  u))N  g(dv)  )  d u 


=  exp  <  — 


7 t  (t(v,u  +  t))N  g(dv)  )  du 


We  use  the  inductive  hypothesis  that 

p(t,K)  —  E[p(f  +  Gk,  0)], 


Gk  ~  Gam (K,  1),  G0  =  0, 


which  trivially  holds  for  K  =  0.  If  the  inductive  hypothesis  holds  for  some  K  >  0,  then  using  the 
tower  property, 


OO  / 

(  \\™ 

pit ,  K  +  1)  =  E 

E 

n  nd 

k=K+2  \ 

(Vk,  Eij  +  u  +  t  j  j 

V  i<j<fc  /  / 

Ti  =  u 

=  E \p(t  +  E1,  K)],  Ex  ~  Exp(l) 

=  E [p(t  +  Gk  +  Ei,  0)] 

=  E [pit  +  Gk+i,  0)]. 


Eq.  (31)  follows  by  setting  t  =  0. 


□ 


Proof  of  Theorem  4.3.3.  The  main  result  follows  by  combining  Lemmas  4.3.1  and  A. 3. 2,  applying 
Jensen’s  inequality,  then  using  monotone  convergence.  The  upper  bound  1  —  <  1  follows 

immediately  from  the  fact  that  the  integral 


7r(r(n,  u  +  GK))N  gidv) 


LTo 


du 
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is  non-negative. 

Fix  A.  It  follows  from  Eq.  (2)  that 

lim  1  —  P(supp(Xi:JV)  C  supp(@jft:))  — >  0. 

K-¥  oo 


_  p 

It  then  follows  from  Lemma  A.  1.1  that  1  —  e  UN>K  — »  0,  where  u>nk  '■  = 
r  (i  -  r  7t(t(v,  u  +  Gk))n g{dv))  du.  By  the  continuous  mapping  theorem,  conclude  that 

p 

ujn,k  — >  0  as  K  — »  oo  and  hence  BNjK  =  ~E[lon,k]  — >■  0  as  K  — >  oo.  □ 

Theorem  A.3.3  (Inverse-Levy  representation  truncation  error).  For  0  A-  IL-Rep(^),  the  conclu¬ 
sions  of  Theorem  4.3.3  hold  with 


Bn,k  =  N  Fk{v[x,  oo))(1  -  7r(x))  v{dx). 


Proof.  We  have  from  Theorem  4.3.3  that 


Bn  k  =  AE 


(1  —  7 r(z/~(zi  +  Gk ))  du 


Lao 


We  first  make  the  change  of  variables  x  =  (u)  to  obtain 

POO  POO 

/  +  Gk))  du  =  /  Tt(v^(uj)  du  = 

Jo  Jgk 

Linally,  use  the  fact  that  for  all  a,b  >  0,  i/  (a)  >  b  <= 
convergence: 


<"v*~{Gk) 


■n(x)v(dx) 


a  <  u([b,oo))  and  monotone 


Bnk  —  A  E 


=  AE 


Lao 


=  AE 


n(x)  v{dx) 

) 

’OO 

\[x  <  v^{Gk)\ n(x)  u(dx) 

1  [Gk  <  u[x,  oo)]7f(a;)  v{dx) 


Lao 

»oo 


=  N  Fk(u[x,oo))tt(x)  u(dx). 


□ 

Theorem  A.3.4  (Thinning  representation  truncation  error).  For  0  <—  T-Rep(z/),  the  conclusions 
of  Theorem  4.3.3  hold  with 


Bn,k  =  A  /  (1  -  7r(f )) 


■&("> 


Fk  (  —  (v)  —  u  )  dug(v)dv. 
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Proof.  We  have  from  Theorem  4.3.3  that 


Bn,k  =  A"E 


7T  I  v\ 


Lao  Jo 

Since  tt (0)  =  0,  using  monotone  convergence  we  have 


j~(v)  >u  +  Gk 

d9 


j  dug(v)di 


BN)K  =  N  7r(v)  /  E 


1 


dz/.  .  ^ 

—  (u)  >U  +  G K 

dg 


dug(v)dv 


—  N  7 t(v) 


—  N  7r(v) 


—  N  7r(v) 


1 0 


&9 


(v)  —  u  )  dug(v)di 


'0 


afG)  (du 

Fk  (  —  (u)  —  u  )  dug(v)dv 


Fk  (u)  dug(v)dv. 


□ 

Theorem  A.3.5  (Rejection  representation  truncation  error).  For  0  <—  R-Rep(z/),  the  conclusions 
of  Theorem  4.3.3  hold  with 

poo 

BNtK  =  N  Fk(h[x ,  oo))(l  -  7r(a;))  u(dx). 

Jo 

Proof.  We  have  from  Theorem  4.3.3  that 

pOO  pi 


Bat  k  —  ATE 


7 r  (  gf~{u  +  Gk)  1 


Lao  Jo 

Since  7f  (0)  =  0,  we  can  eliminate  the  innermost  integral: 


{d^(u  +  Gk))  >  v 

d/i 


dr;  du 


7r  +  Gr-)1 

dz/ 


^  (/i <-(u  +  Gk))  >  v 
d/i 


di; 


=  — (/i^(u  +  Gk))tt(p^(u  +  Gji-)). 

d/i 

Making  the  change  of  variable  x  =  pF  (u  +  Gk)  and  reasoning  analogously  to  the  proof  of  Theo¬ 
rem  A. 3. 3,  we  obtain 


Bn,k  =  AE 


^  (Gif)  dz/ 

—  (x)7r(a;)/i(da;) 
)  d/i 


—  N  Fk(p[x,  oo))7r(a;)z/(da;). 


□ 


Theorem  A.3.6  (Bondesson  representation  truncation  error).  For  0  A-  R-Rep(z/),  the  hypotheses 
of  Theorem  4.3.3  are  satisfied  and  its  conclusions  hold  with 


Bn,k  =  N  /  (l  —  E  [7r(ue  Gx)])z/(du). 
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Proof.  While  Theorem  A. 3. 6  can  be  proved  using  Theorem  4.3.3,  we  take  an  alternative  approach 
using  more  direct  Poisson  process  arguments. 

Lemma  A.3.7.  For  K  >  0,  Gk  ~  Gam  (A',  c),  and  G0  =  0, 


P(supp(A'i;jv)  C  Slipp ( 0 A' ) ) 


E 


exp 


(l  —  7r(ve  Gk 


Proof  of  Lemma  A.3.7.  For  t  >  0,  the  measure  10  has  distribution  10  ~  CRM(z/t)  where 
vt(& 9)  :=  u(d6/t).  Further,  define  Xn  |  0  ~  LP (4,  10),  and 


p(t,K) 


P(supp(Xi:jv)  C  SUpp(l0K)) 


E 


JJ  7T  (tVke  r^c)N 

_k=K+l 


We  will  prove  that 


pit ,  AT)  =  E 


exp 


7T (tve  G/f)iV)  v(dv) 


and  then  set  1  =  1  to  obtain  the  desired  result.  The  proof  proceeds  by  induction.  For  K  =  0,  the 
event  supp(A'1:iV)  C  supp(10x)  is  equal  to  supp(X1:iv)  C  0  and  thus  supp(X1;jv)  =  0-  This  is 
in  turn  equivalent  to  the  probability  that  after  thinning  a  CRM(z/t)  by  "{())  N  times,  the  remaining 
process  has  no  atoms,  i.e.  the  probability  that  CRM  ((l  —  tt(0)n)  ut(d 6))  has  no  atoms.  Since  a 
Poisson  process  with  measure  /x(d 9)  has  no  atoms  with  probability  e~  f  ^d9\ 


P(t,  0)  =  exp  (1  -  7r(0)N)  ist(d0)^ 


=  E 


exp  — 


(l  —  n(tve  G’°)Ar)  u(dv) 


The  second  equality  follows  by  the  change  of  variables  v  —  9/t  and  because  G0  =  0  with  proba¬ 
bility  1.  The  inductive  hypothesis  is  that  for  K  >  0, 

p(t,  K)  —  E  [p  (te~GK ,  0)]  ,  Gk  ~  Gam  (A',  c). 

Using  the  tower  property  to  condition  on  Yi/c  and  the  fact  that  the  14  are  i.i.d., 


pit ,  K  +  1)  =  E 


fl  7T  (t,Vke  Vk/c ) 


N 


,k=K+ 2 


=  E  E  Yl  7T  (te-ri/cVke-Tk'c)  \T1/c 

_k=K+ 1 

=  E  [p  (le~ri/c,  AT)]  =  E  [p  (le-(ri/c+Gx),0)]  =  E  [p  (te~GK+\  0)]  , 


since  V\/c  ~  Exp(c).  The  desired  result  follows  by  setting  1  =  1. 


□ 


First  combine  Lemmas  4.3.1  and  A.3.7,  then  apply  Jensen’s  inequality.  The  bounds  on  BNyK 
and  fact  that  limft'^oc  BNK=  0  follows  by  the  same  arguments  as  in  the  proof  of  Theorem  4.3.3. 

□ 
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A.3.3  Superposition  representation  truncation 

Proof  of  Theorem  4.3.4.  We  begin  with  Lemma  4.3.1,  and  note  that 

P  (supp(X1:Ar)  C  supp(0A))  =  P  (supp(Xi:Ar)  fl  supp ( © A )  =  0)  • 

After  generating  X1:N  from  0,  we  can  view  the  point  process  representing  the  atoms  in  0 £  not 
contained  in  any  Xn  as  0^  thinned  by  tt(9)n  (i.e.  the  Bernoulli  trial  with  success  probability 
1  —  ir(6)  to  generate  an  atom  failed  N  times),  and  thus  the  remaining  process  is  0j  thinned  by 
1  —  7 t(9)n.  Therefore,  the  above  event  is  equivalent  to  the  event  that  thinned  by  1  —  "(f))N  has 
no  atoms.  Using  the  fact  that  a  Poisson  process  with  measure  p(d9)  has  no  atoms  with  probability 
e'^(d9),  we  have  the  formula  for  BN  K, 

P  (supp(X1:Ar)  C  supp(0A))  =  e-/(1-^WJV)4W. 


Since  BN}K  :=  f  (I-tt^)^)  iSft(d9)  is  nonnegative,  the  error  bound  lies  in  the  interval  [0,1].  To 
show  that  lim/^oc  k  =  0,  first  note  that 


/»  n  OO  n 

/  (l  —  7T (9)N)  v(d 9)  <  N  ( 1  —  7 t(9))  is(d9)  =  N  I h(x  \  9)v(d9)  <  oo, 

J  J  x=l  J 

by  Eq.  (2).  Further,  splitting  v  into  its  individual  summed  components,  we  have  that 

/(l  -  n{9))v{d0)  =  Y1  /(l  -  7r(9))uk(d9)  +  f  (1  -  n(9))v+(d9). 


fc=i 


Combining  the  results  from  Eqs.  (33)  and  (34)  yields 


(33) 


(34) 


lirn  /  (1  -  vr(0))z/+(d0)  =  0. 

K— >oo  / 


□ 


A.3.4  Stochastic  mapping  truncation 

Proof  of  Proposition  4.3.5.  Let  i t(u)  =  h(0\u).  For  notational  brevity,  define  0  to  be  the 
event  where  supp(A"i:jv)  C  supp(0A),  and  Q  to  be  the  corresponding  transformed  event  where 
supp(X1:Ar)  C  supp(0A).  Then  we  have 


p(Q>  =  e  nr./f+i#(“0 


N 


If  h(x  |  9)  =  Bern(x;  1  —  ttk^(9))  and  N  —  1,  then 


p(q)  =  e  n 


lk=K+l 


fn(u)Nn(9k,  dw) 


p(g). 


□ 
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A.3.5  Truncation  with  hyperpriors 

Proof  of  Proposition  4.3.6.  By  repeating  the  proof  of  Lemma  4.3.1  in  Appendix  A. 3. 1,  except  with 
an  additional  use  of  the  tower  property  to  condition  on  the  hyperparameters  <J>,  an  additional  use 
of  Fubini’s  theorem  to  swap  integration  and  expectation,  and  Jensen’s  inequality,  we  have 

^  II Py  -  Pw\\i  <  E  [1  -  P  (supp(Xi:Ar)  C  supp(e^)  I  $)] 

<  E  [l  -  e~BN’K^]] 

<  ]_  _  g-E[Sjv,Jf  ($)] 

□ 


A.4  Proofs  of  normalized  truncation  bounds 

Proof  of  Lemma  4.4.3.  First,  we  demonstrate  that  the  argmax  is  well-defined.  Note  that 

arg  max  T  +  log  pi  =  arg  max  exp  (Tj  +  log  pf) 

ieN  ieN 

if  it  exists,  due  to  the  monotonicity  of  exp.  Similarly,  existence  of  either  proves  the  existence  of 
the  other.  Since  Tt  are  i.i.d.  Gumbel(0, 1), 

P  (exp  (Tt  +  log  pi )  >  e)  =  1  -  exp  (- e 

=  1  -  exp  {-Pi/e)  <  1  -  (1  -  pi/e). 


Therefore, 

oo  oo  oo 

y  P  (exp  (Tj  +  log  Pi)  >  e)  <  y  1  -  (1  -  Pi/e)  =  e-1  y  pt  <  oo. 

i= 1  i—  1  2—1 

This  is  sufficient  to  demonstrate  that 

exp  (Tj  +  log pj)  aA  0  as  i  — >  oo. 

Finally,  since  any  positive  sequence  converging  to  0  can  have  only  a  finite  number  of  elements 
greater  than  any  e  >  0,  set  e  =  exp(Ti  +  log  pi),  and  thus 

arg  max  exp  (Tj  +  log  pf)  =  argmax  exp  (Tj  +  logpj) 

iSN  i'.Ti+\ogpi>e 

where  the  right  hand  side  exists  because  it  computes  the  maximum  of  a  finite,  nonempty  set  of 
numbers.  Note  that  the  arg  max  is  guaranteed  to  be  a  single  element,  since  Tj  +  logpj  has  a  purely 
diffuse  distribution  on  M. 

Now  that  the  a.s.  existence  and  uniqueness  of  the  arg  max  has  been  demonstrated,  we  can 
compute  its  distribution.  First,  note  that 

OO 

P  (Tj  +  logpj  <  x  Vi  G  N,  i  f  j)  =  exp  (— e~(3:-logy  =  exp  (— e~x{s  —  pf)  , 

*=!,»  4  3 
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where  s  :  =  JP  p,.  So  then 


P  [j  —  arg  max  Tj  +  logp*  ]  =  P  [Tt  +  logp*  <  Tj  +  logp^  Mi  e  N) 

V  ieN  J 


=  /  e_e  x(s_w)e_(a;_loePj+e 


-(^-logpj) 


)d 


x 


=  Pj  /  e"se  e_xdx 


Pi 


J2iPi  J 
Pj 

E  iPi’ 


3_e-(x-iogs)e_(a._logs)d 


X 


where  the  last  integral  is  1  since  its  integrand  is  the  Gumbel(s,  1)  density. 


□ 


A.4.1  Normalized  series  representation  truncation 

Proof  of  Theorem  4.4.4.  First,  we  apply  Lemma  4.4.1, 


2  II PY  -  Pw\\i  <  1  -  P  (Ai :JV  C  SUpp(Sif))  . 


Next,  by  Jensen’s  inequality, 

p  (Ai:jv  c  supp(S^))  =  E 


©aW 

©W 


N 


>  E 


©aW 


N 


L  ©(*)  J 

=  P(Xi  G  supp(SA-))Ar. 


The  remaining  part  of  this  proof  quantifies  the  probability  that  sampling  X\  from  5  generates  an 
atom  in  the  support  of  EK  (equal  to  the  support  of  (~)K,  since  S  is  just  the  normalization  of  0). 
To  do  this,  we  use  the  trick  based  on  Lemma  4.4.3:  we  log-transform  the  rates  in  0,  perturb  them 
all  by  i.i.d.  Gumbel  random  variables,  and  quantify  the  probability  that  the  max  occurs  within  the 
atoms  of  Ok- 

First,  we  split  the  sequential  representation  of  0  into  the  truncation  £>k  and  its  tail  0  J,  using 
the  form  from  Eq.  (6), 

oo  K  oo 

0  =  r(vk,  rfc)<^  =  t(\ 4,  rk)s^h  +  ^  r(vk,  r k)6tk  :=  0a  +  ©£. 

k= 1  k= 1  k=K-\- 1 

Next,  we  define  the  maximum  of  the  log-transformed,  Gumbel  perturbed  rates  in  0  k  as 

Mk  :=  max  log t{\ 4,  Tk)  +  Wk, 


where  Wk  ~  Gumbel(0, 1).  Since  Tk  are  from  a  unit-rate  homogeneous  Poisson  process,  if  we 
condition  on  the  value  of  Tk  ,  this  is  equivalent  to  conditioning  on  the  event  that  the  Poisson 
process 
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has  exactly  K  —  1  atoms  on  [0,  IV],  and  the  Kth  atom  is  at  Y K .  And  since  MK  does  not  depend  on 
the  ordering  of  (Tk)%=l, 

Mk  \  Yk  =  max  \  log t(Vk,  Tk)  +  WK,  max  logr(14,  Uk)  +  Wk  l  , 

l<fc<A  —  1 


where  Uk  rs j  U nif (0,  IV).  Therefore,  Mk  T  k  is  the  maximum  of  a  collection  of  independent 
random  variables,  so  we  can  compute  its  CDF  by  simply  taking  the  product  of  the  CDFs  of  each 
of  those  random  variables.  Using  standard  techniques  for  transformation  of  independent  random 
variables,  we  have  that 

/OO  /»  1 


JO  JO 

poo 


P  (log  r (IV,  rK)  +  WK  <  X  \  T  K)  = 


g(v)e~T(v'r KU)e  xdudv,  k  <  K 
g(v)e~T^rK)e~xdv, 


so 

P  (Mk  5;  %  |  r k)  = 

Defining  the  function 

we  have 


f»oo  p\ 


'0  JO 


g(v)e-T^v^KU)e~xdudWj 


K—l 


g(v)e-T(v'VK)e  xdv  )  . 


J(u,  t)  :=  E  [e_t'T^y,u)]  ,  V  ~  g, 


p  (Mk  <x\rK)=^J  J  (r  KU,  e~x )  J  (r^,  e~x)  . 

Next,  we  define  an  analogous  maximum  for  the  tail  process  rates  in  0^, 


M^  ■■=  sup  logr(14,  Tk)  +  Wk. 

k>I< 


Conditioning  on  r^,  we  have 


M+  |  Tk  =  sup  log  r(Vk,  T’k  +  TK)  +  Wk, 

k>  1 


where  Vk  is  a  unit-rate  homogeneous  Poisson  process  on  M+.  Now  note  that  since  T’k  is  a  Poisson 
point  process  on  M+,  so  is  log  r (14,  r'fc  +  T K)  +  Wk  (using  Poisson  process  stochastic  mapping), 
with  rate  measure 

(POO  POO  \ 

J  j  e-(t-1ogr(,,U+rJf))-e-(‘-log^(-^))^(v)dMd,;j  df. 

Therefore,  P  (M^  <  x  \  V K)  is  equal  to  the  probability  that  the  above  Poisson  point  process  has  no 
atoms  with  position  greater  than  x.  Since  a  Poisson  process  on  M+  with  measure  //  has  no  atoms 
above  x  with  probability  e~  h  il(<ll)_ 

P  (M+  <x\TK)  =e  ,x  v°  Jo 
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Noticing  that  the  integrand  in  t  is  a  Gumbel  density,  we  can  use  Fubini’s  theorem  to  swap  integrals 
and  evaluate: 


P  (M+  <  x 


r  \  =  efS°  fS°(e-T(v'u+rK)e  X-l)g(v)dudv 
=  ef™{j(u+rK,e-*)-i)dU' 


Taking  the  derivative  yields  the  density  of  M \  VK  with  respect  to  the  Lebesgue  measure.  There¬ 
fore,  using  the  Gumbel-max  trick  from  Lemma  4.4.3,  we  substitute  the  results  for  Mr  |  T r  and 
M+|  T k  into  the  original  bound  yielding 

\  II Py  ~  Pw\h  <  1  -  P  (Xi  C  supp(Sx))Ar 

=  1  -  (1  -  E  [P  (mk  <  M+  I  r*)])* , 

where  (using  the  substitution  t  =  e~x ) 

r°°  d 

P  ( Mk  <  Mk  \  VK)  —  P  (MK  <  x  \  T^)  — P  (M^  <  x  \  T K)  da: 

J  —  OO 

=  J™  j{rK,t)  ^  j(rKu,t)du^j  Dd«)  d t. 

The  fact  that  the  bound  is  between  0  and  1  is  a  simple  consequence  of  the  fact 
that  P  e  supp(Hx))  G  [0,1],  and  the  asymptotic  result  follows  from  the  fact  that 

P  (Xl  G  supp  (5^))  — *  1  as  K  — »  oo.  □ 


A.4.2  Normalized  superposition  representation  truncation 

Proof  of  Theorem  4.4.5.  The  same  initial  technique  as  in  the  proof  of  Theorem  4.4.5  yields 

^  I \pv  ~  Pw\h  <  1  -  P  (Xi  G  supp  (E k))N  ■ 

The  remaining  part  of  this  proof  quantifies  the  probability  that  sampling  Xi  from  5  generates  an 
atom  in  the  support  of  S k-  Since  most  of  the  following  developments  are  similar  for  (-)  k ■  vk 
and  0^,  we  will  focus  the  discussion  on  Qk,vr  and  reintroduce  the  tail  quantities  when 
necessary.  First,  we  transform  the  rates  of  £>k  under  the  stochastic  mapping  w  =  log  6  +  W, 
where  W  ~  Gumbel(0, 1),  resulting  in  a  new  Poisson  point  process  with  rate  measure 

(t-w)-e-^ewUK^dw\  dt 


The  probability  that  all  points  in  this  Poisson  point  process  are  less  than  a  value  x  is  equal  to  the 
probability  that  there  are  no  atoms  above  x.  Defining  MK  to  be  the  supremum  of  the  points  in  this 
process,  combined  with  the  basic  properties  of  Poisson  point  processes,  we  have 


P (Mk  <  x)  =  exp 


—  ( t—w)—e 


VA'(e^)dtedf  J 
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Using  Fubini’s  theorem  to  swap  the  integrals,  we  can  evaluate  the  inner  integral  analytically  by 
noting  the  integrand  is  a  Gumbel(u>,  1)  density, 

P (Mk  <x)=  exp  J  (l  -  e-e~(x~w))ewisK(ew)d v?J 

=  exp  Q  ( e~0e~x  -  l)  uK(d 0))  . 

We  can  take  the  derivative  with  respect  to  x  to  obtain  its  density  with  respect  to  the  Lebesgue  mea¬ 
sure.  The  above  derivation  holds  true  for  0^,  replacing  uK  with  and  MK  with  Mj£.  Therefore, 
using  the  Gumbel-max  trick  from  Lemma  4.4.3,  we  substitute  the  results  for  Mk  and  M^-  into  the 
original  bound,  yielding 

^  || Py  ~  Pw\\i  <  1  -  P  Pu  G  supp  (©a-))^ 

=  1  -  (1  -  p  {mk<m+))n, 

where  (using  the  substitution  t  =  e~x,  and  the  fact  that  0e~6t  is  dominated  by  6  to  swap  integration 
and  differentiation) 

fOO  (J 

P  (Mk  <  M+)  =  /  P(Ma-  <  x)— P  (M+  <  x)  dx 
J  —  OO 

=iy{r~~~  iymi  (/p'"- 

=  -  J™  A  (J  (e-0t  _  !)  v+(dO)\  dt 

=  J™  Q  ee-etis+(dO)^J  dt. 

The  fact  that  the  bound  lies  between  0  and  1  is  a  simple  consequence  of  the  fact  that 
P  {X\  e  supp(Hat))  G  [0, 1].  The  fact  that  the  error  bound  asymptotically  approaches  0  is  a  con¬ 
sequence  of  the  monotone  convergence  theorem  applied  to  the  decreasing  sequence  of  functions 

9v+(d6).  □ 

A.4.3  Truncation  with  hyperpriors 

Proof  of  Proposition  4.4.6.  By  repeating  the  proof  of  Lemma  4.4. 1  in  Appendix  A. 3. 1,  except  with 
an  additional  use  of  the  tower  property  to  condition  on  the  hyperparameters  <J>,  an  additional  use 
of  Fubini’s  theorem  to  swap  integration  and  expectation,  and  Jensen’s  inequality,  we  have 

^  I \py  -Pwlli  <  E  [1  -  P  (X1:N  C  supp(0Ar)  |  $)] 

<1-E[(1-Bk($))n] 

<l-(l-E[BK(dt)])N . 


□ 
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List  of  Acronyms 


Acronym 

Full  Phrase 

BNP 

Bayesian  nonparametrics 

BP 

Beta  process 

BPP 

Beta  prime  process 

B-rep 

Bondesson  representation 

CDF 

Cumulative  density  function 

CRM 

Completely  random  measure 

DB-Rep 

Decoupled  Bondesson  representation 

DP 

Dirichlet  process 

r  p 

Gamma  process 

IL-Rep 

Inverse-Levy  representation 

LP 

Likelihood  process 

LomP 

Lomax  process 

MCMC 

Markov  Chain  Monte  Carlo 

NCRM 

Normalized  completely  random  measure 

Nrp 

Normalized  gamma  process 

PL -rep 

Power-law  representation 

R-rep 

Rejection  representation 

SB -rep 

size-biased  representation 

T-rep 

Thinning  representation 

VB 

Variational  Bayes 
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